2:00 PM - 2:20 PM
[1B3-OS-41a-02] Multimodal Prompt Analysis in Robotic Manipulation Tasks
Keywords:AI, Robot, Multimodal
In this study, we analyzed multimodal prompts in a robot manipulation task, focusing on the interaction between textual and visual inputs. Using the VIMA benchmark, we evaluated the effects of modality dependence and the input order of observation tokens on the task success rate. The results revealed an overdependence on specific modalities and input order, indicating important issues in achieving robust multimodal learning. Our findings contribute to improve the generalizability of models in robot tasks.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.