Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I2-GS-7c] 画像音声メディア処理:音声認識と指示理解

Fri. Jun 11, 2021 11:00 AM - 12:40 PM Room I (GS room 4)

座長:宮西 大樹(国際電気通信基礎技術研究所)

12:00 PM - 12:20 PM

[4I2-GS-7c-04] Understanding Object Fetching Instructions Including Referring Expressions about Target Objects Based on Target-Dependent UNITER

〇Shintaro Ishikawa1, Komei Sugiura1 (1. Keio University)

Keywords:Natural Language Processing, Image Processing, Object Manipulation, Referring Expression, Robot

Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by a variety of ambiguities and missing information. Existing methods are insufficient to model reference expressions that specify relationships between objects. In this paper, we propose Target-dependent UNITER, which learns directly the relationship between the target object and other objects by focusing on the relevant regions within an image, instead of the whole image. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.

