2:00 PM - 2:20 PM
[2D3-E-4-03] A Multimodal Target-Source Classifier Model for Object Fetching from Natural Language Instructions
Keywords:Deep Learning in Robotics and Automation, Spoken Language understanding, Domestic Robots
In this paper, we address the fetching task from ambiguous instructions. A typical fetching task consists of picking up a target object specified by ambiguous instructions. We specifically propose a multimodal target-source classifier model (MTCM) that grounds the instructions in the scene. More explicitly, MCTM can predict the likelihood of a target object in addition to the source of this target using linguistic and visual features. Our approach improves the accuracy of the previous state-of-the-art method for target object prediction in fetching task.