JSAI2020

Presentation information

General Session

General Session » J-11 Robot and real worlds

[1Q3-GS-11] Robot and real worlds: Multimodal information

Tue. Jun 9, 2020 1:20 PM - 3:00 PM Room Q (jsai2020online-17)

座長:青島武伸(パナソニック株式会社)

2:20 PM - 2:40 PM

[1Q3-GS-11-04] Cross-modal BERT : Acquisition of Multimodal Representation and Cross-modal Prediction based on Self-Attention

〇Yuta Kyuragi1, Kazuki Miyazawa1, Tatsuya Aoki1,2, Takato Horii1, Takayuki Nagai1,2 (1. Osaka University, 2. The University of Electro-Communications)

Keywords:Multimodal Information Processing, Self-Attention, Communication, Symbol Emergence in Robotics, Natural Language Processing

Humans can abstract rich representation from multi-modal information and use it in daily tasks. For instance, object concepts are represented by the combination of vision, sound, tactile, language, etc. During communication between humans, speakers express this information observed by their own sensory organs as linguistic information. At the same time, listeners infer the speakers’ sensation from linguistic information through their knowledge. Therefore, communication agents have to obtain the bidirectionally predictable knowledge from the multi-modal information. We propose a predictable bidirectional model between images and language based on BERT, which employs a hierarchical self-attention structure. The proposed cross-modal BERT was evaluated in a cross-modal prediction task and a multi-modal categorization task. Experimental results showed that the cross-modal BERT acquired rich multi-modal representation and performed cross-modal prediction in both directions. The proposed model also showed higher performance using multi-modal information rather than using a single modality in the category estimation task.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password