JSAI2021

Presentation information

General Session

General Session » GS-5 Language media processing

[4J1-GS-6d] 言語メディア処理:自然言語処理(1/2)

Fri. Jun 11, 2021 9:00 AM - 10:40 AM Room J (GS room 5)

座長:川野 陽慈(慶應義塾大学)

10:20 AM - 10:40 AM

[4J1-GS-6d-05] Generating Object Manipulation Instructions Including Referring Expressions of Target Objects and Destinations Based on Case Relation Transformer

〇Motonari Kambara1, Komei Sugiura1 (1. Keio University)

Keywords:multimodal, natural language generation, target, destination

The purpose of this paper is to extend the dataset based on a cross-modal generative language generation model. We propose a Case Relation Transformer (CRT) that generates a fetching instruction sentence from an image, such as ``Move the blue flip-flop to the lower left box.'' Unlike existing methods, CRT uses Transformer to capture the visual and geometric features of objects in an image. The Case Relation Block allows the CRT to process the object. We conducted comparative experiments and human evaluations. Experimental results showed that CRT outperformed the baseline methods.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password