A Multimodal Target-Source Classifier Model for Object Fetching from Natural Language Instructions

Aly Magassouba

2:00 PM - 2:20 PM

[2D3-E-4-03] A Multimodal Target-Source Classifier Model for Object Fetching from Natural Language Instructions

〇Aly Magassouba¹, Komei Sugiura¹, Hisashi Kawai¹ (1. NICT)

Keywords:Deep Learning in Robotics and Automation, Spoken Language understanding, Domestic Robots

In this paper, we address the fetching task from ambiguous instructions. A typical fetching task consists of picking up a target object specified by ambiguous instructions. We specifically propose a multimodal target-source classifier model (MTCM) that grounds the instructions in the scene. More explicitly, MCTM can predict the likelihood of a target object in addition to the source of this target using linguistic and visual features. Our approach improves the accuracy of the previous state-of-the-art method for target object prediction in fetching task.

Presentation information

[2D3-E-4] Robots and real worlds: planning and control

[2D3-E-4-03] A Multimodal Target-Source Classifier Model for Object Fetching from Natural Language Instructions