2:40 PM - 3:00 PM
[1Q3-GS-11-05] Sentence Generation for Fetching Instruction based on Multimodal Attention Branch Network
Keywords:Multimodal language generation, Domestic service robot
Domestic service robots (DSRs) are a promising solution to the shortage of home care workers.
Nonetheless, one of the main limitations of DSRs is their inability to naturally interact through language.
Recently, data-driven approaches have been shown to be effective for tackling this limitation, however, they often require large-scale datasets, which is costly.
Based on this background, we aim to perform automatic sentence generation for fetching instructions, e.g., ``Bring me a green tea bottle on the table.''
This is particularly challenging because appropriate expressions depend on the target object, as well as its surroundings.
In this paper, we propose a method that generates sentences from visual inputs.
Unlike other approaches, the proposed method has multimodal attention branches that utilize subword-level attention and generate sentences based on subword embeddings.
In the experiment, we compared the proposed method with a baseline method using four standard metrics in image captioning.
Experimental results show that the proposed method outperformed the baseline in terms of these metrics.
Nonetheless, one of the main limitations of DSRs is their inability to naturally interact through language.
Recently, data-driven approaches have been shown to be effective for tackling this limitation, however, they often require large-scale datasets, which is costly.
Based on this background, we aim to perform automatic sentence generation for fetching instructions, e.g., ``Bring me a green tea bottle on the table.''
This is particularly challenging because appropriate expressions depend on the target object, as well as its surroundings.
In this paper, we propose a method that generates sentences from visual inputs.
Unlike other approaches, the proposed method has multimodal attention branches that utilize subword-level attention and generate sentences based on subword embeddings.
In the experiment, we compared the proposed method with a baseline method using four standard metrics in image captioning.
Experimental results show that the proposed method outperformed the baseline in terms of these metrics.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.