JSAI2023

Presentation information

Organized Session

Organized Session » OS-24

[3G1-OS-24a] 日常生活知識とAI

Thu. Jun 8, 2023 9:00 AM - 10:40 AM Room G (A4)

オーガナイザ:福田 賢一郎、江上 周作、宮田 なつき、Qiu Yue、鵜飼 孝典、古崎 晃司、川村 隆浩、市瀬 龍太郎、岡田 慧

10:00 AM - 10:20 AM

[3G1-OS-24a-04] State Recognition based on Large-Scale Vision-Language Model and Evolutionary Computation for Daily Assistive Robots

〇Kento Kawaharazuka1, Yoshiki Obinata1, Naoaki Kanazawa1, Kei Okada1, Masayuki Inaba1 (1. The University of Tokyo)

Keywords:Large-Scale Vision-Language Model, Robotics, Daily Life

In this study, we conduct environmental state recognition using Visual Question Answering (VQA) in Pre-Trained Vision-Language Models (PTVLM) for daily assistive robots.
In VQA, we integrate results from multiple randomized images and various questions with different forms, articles, state expressions, and wording.
Since each question has different states that it can recognize correctly, we use an appropriate combination of questions by optimizing it with evolutionary computation.
This makes it possible to recognize the states of transparent doors and water, which have been difficult to recognize so far.
We believe that this idea will revolutionize the recognition strategies of robots, since it does not require retraining of the network or programming, and a complex recognizer can be easily constructed by simply providing a set of appropriate questions to a single model.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password