JSAI2020

Presentation information

General Session

General Session » J-11 Robot and real worlds

[1Q3-GS-11] Robot and real worlds: Multimodal information

Tue. Jun 9, 2020 1:20 PM - 3:00 PM Room Q (jsai2020online-17)

座長:青島武伸(パナソニック株式会社)

2:40 PM - 3:00 PM

[1Q3-GS-11-05] Sentence Generation for Fetching Instruction based on Multimodal Attention Branch Network

〇Tadashi Ogura1, Aly Magassouba1, Komei Sugiura1, Tsubasa Hirakawa2, Takayoshi Yamashita2, Hironobu Fujiyoshi2, Hisashi Kawai1 (1. National Institute of Information and Communications Technology, 2. Chubu University)

Keywords:Multimodal language generation, Domestic service robot

Domestic service robots (DSRs) are a promising solution to the shortage of home care workers.
Nonetheless, one of the main limitations of DSRs is their inability to naturally interact through language.
Recently, data-driven approaches have been shown to be effective for tackling this limitation, however, they often require large-scale datasets, which is costly.
Based on this background, we aim to perform automatic sentence generation for fetching instructions, e.g., ``Bring me a green tea bottle on the table.''
This is particularly challenging because appropriate expressions depend on the target object, as well as its surroundings.
In this paper, we propose a method that generates sentences from visual inputs.
Unlike other approaches, the proposed method has multimodal attention branches that utilize subword-level attention and generate sentences based on subword embeddings.
In the experiment, we compared the proposed method with a baseline method using four standard metrics in image captioning.
Experimental results show that the proposed method outperformed the baseline in terms of these metrics.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password