3:00 PM - 3:20 PM
[1G4-OS-21a-01] Active Exploration Method for Simultaneous Learning of Maps and Multimodal Spatial Concepts and Utilization of the Foundation Model
Keywords:Active inference, Semantic mapping, SLAM
In order for a robot to perform tasks related to human language, it needs to have a Semantic Map that maps semantic information about locations. Learning such a map often requires human intervention. In this study, we propose an active semantic mapping system by a robot that does not require human intervention, thereby reducing the burden on the user in the semantic mapping process. In this paper, we propose a method in which a robot actively learns spatial concepts and generates maps at the same time. Learning of spatial concepts is achieved through multimodal categorization using unsupervised online learning. Captions generated by CLIP, the underlying model for image captioning, are used to map the real world to the language. In order to evaluate what kind of spatial search method leads to efficient semantic mapping, we conducted experiments in a simulation environment using comparison methods which use different methods for determining the destination. We also evaluated the usefulness of the learning results for human language-related tasks in a real-world environment.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.