Robot Task planning with Vision-Language Model via Hand-written Instruction for Remote Control.

Kosei Tanada

2:20 PM - 2:40 PM

[4P3-OS-17c-02] Robot Task planning with Vision-Language Model via Hand-written Instruction for Remote Control.

〇Kosei Tanada¹, Yuka Iwanaga¹, Masayoshi Tsuchinaga¹, Takemitsu Mori¹, Takashi Yamamoto² (1. Toyota Motor Corporation, 2. Aichi Institute of Technology)

Keywords:Human Support Robot, Remote Control, Vision-language Model(VLM)

The social implementation of assistive robots is a crucial solution to problems such as labor shortages and improving the Quality of Life(QoL) in an aging society. In order to utilize robots in everyday life, a remote control system that allows users to easily manipulate robots anytime, anywhere is indispensable. One intuitive way to control robots for users is hand-written instruction, where users can freely sketch instructions on a screen. In order to control the robot using hand-written lines, it is necessary to understand the semantic information of these lines and transfer them into robot commands. In this paper, we propose a method of interpreting hand-written instructions using Vision-Language Models(VLMs). In this method, VLM takes pre-prompt including APIs, constraints, and examples as well as an observation image with hand-written lines, and outputs low-level task code sequences. Additionally, the generated code takes hand-written lines as an argument, enabling remote control that includes specifying the ambiguous position and path that are challenging to express through language. We demonstrate the high success rate of various tasks using our method. Furthermore, we show the high usability of our method in a user experiment with 10 participants by comparing it with a voice-based method.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4P3-OS-17c] OS-17

[4P3-OS-17c-02] Robot Task planning with Vision-Language Model via Hand-written Instruction for Remote Control.

Password