JSAI2025

Presentation information

Organized Session

Organized Session » OS-41

[1B3-OS-41a] OS-41

Tue. May 27, 2025 1:40 PM - 3:20 PM Room B (Small hall)

オーガナイザ:鈴木 雅大(東京大学),岩澤 有祐(東京大学),河野 慎(東京大学),熊谷 亘(オムロンサイニックエックス),松嶋 達也(東京大学),Paavo Parmas(東京大学),谷口 尚平(東京大学)

2:00 PM - 2:20 PM

[1B3-OS-41a-02] Multimodal Prompt Analysis in Robotic Manipulation Tasks

〇Daiki Takahashi1, Masahiro Suzuki2, Yutaka Matsuo2 (1. Aoyama Gakuin University, 2. Graduate School of Engineering, The University of Tokyo)

Keywords:AI, Robot, Multimodal

In this study, we analyzed multimodal prompts in a robot manipulation task, focusing on the interaction between textual and visual inputs. Using the VIMA benchmark, we evaluated the effects of modality dependence and the input order of observation tokens on the task success rate. The results revealed an overdependence on specific modalities and input order, indicating important issues in achieving robust multimodal learning. Our findings contribute to improve the generalizability of models in robot tasks.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password