JSAI2024

Presentation information

Organized Session

Organized Session » OS-16

[4O3-OS-16e] OS-16

Fri. May 31, 2024 2:00 PM - 3:20 PM Room O (Music studio hall)

オーガナイザ:鈴木 雅大(東京大学)、岩澤 有祐(東京大学)、河野 慎(東京大学)、熊谷 亘(東京大学)、松嶋 達也(東京大学)、森 友亮(株式会社スクウェア・エニックス)、松尾 豊(東京大学)

2:20 PM - 2:40 PM

[4O3-OS-16e-02] Vision-Language-Conditioned Diffusion Policies for Robotic Control

〇Akira Kinose1, Koki Oguri1, Tomoyuki Kagaya1, Ryo Okumura2, Tadahiro Taniguchi3,2 (1. Panasonic Connect Co., Ltd., 2. Panasonic Holdings Corporation, 3. Ritsumeikan University)

[[Online]]

Keywords:Robotics, Diffusion Model

Achieving robots capable of understanding human language and autonomously determining actions based on it is a significant research challenge in the fields of robotics and machine learning. If robots can accurately grasp the intentions embedded in humans' abstract instructions and execute appropriate controls, it is expected that assistance to humans and task execution efficiency will greatly improve.
In this paper, we propose a imitation learning method for robot control to autonomously determine actions based on human language instructions and goal images, named Vision-Language-conditioned Diffusion Policy (VLDP). Traditional language-based robot control methods have been inadequate in fully modeling the inherent ambiguity and polysemy present in human language. VLDP addresses this issue by extracting semantics from human language instructions and goal images through a visual language model and conditioning them on a Diffusion Policy. This enables the robot to generate multiple valid actions in response to instructions containing linguistic ambiguity.
Experiments evaluate the success rate of action generation based on language instructions, the ability to adapt to unseen language instructions, and the multimodality of actions generated by the proposed method.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password