10:20 AM - 10:40 AM
[4J1-GS-6d-05] Generating Object Manipulation Instructions Including Referring Expressions of Target Objects and Destinations Based on Case Relation Transformer
Keywords:multimodal, natural language generation, target, destination
The purpose of this paper is to extend the dataset based on a cross-modal generative language generation model. We propose a Case Relation Transformer (CRT) that generates a fetching instruction sentence from an image, such as ``Move the blue flip-flop to the lower left box.'' Unlike existing methods, CRT uses Transformer to capture the visual and geometric features of objects in an image. The Case Relation Block allows the CRT to process the object. We conducted comparative experiments and human evaluations. Experimental results showed that CRT outperformed the baseline methods.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.