1:20 PM - 1:40 PM
[2O4-GS-7-01] Instruction Comprehension Based on Funnel UNITER for Object Manipulation Tasks
Keywords:Natural Language Processing, Image Processing, Object Manipulation, Referring Expression, Robot
In this study, we develop a multimodal language comprehension model that allows domestic service robots to understand object fetching instructions.
We propose a multimodal language understanding model, Funnel UNITER, which gradually reduces the dimensions of the query, key, and value in each transformer layer to reduce the computational cost of self-attention.
We also built a new dataset for the multimodal language understanding for fetching instruction (MLU-FI) task called the ALFRED-fetch dataset.
Our model outperformed the baseline method in both classification accuracy and training time.
We propose a multimodal language understanding model, Funnel UNITER, which gradually reduces the dimensions of the query, key, and value in each transformer layer to reduce the computational cost of self-attention.
We also built a new dataset for the multimodal language understanding for fetching instruction (MLU-FI) task called the ALFRED-fetch dataset.
Our model outperformed the baseline method in both classification accuracy and training time.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.