Generating Description for Possible Collisions in Object Placement Tasks

Takumi Komatsu

10:20 AM - 10:40 AM

[3G1-OS-24a-05] Generating Description for Possible Collisions in Object Placement Tasks

〇Takumi Komatsu¹, Motonari Kambara¹, Shumpei Hatanaka¹, Haruka Matsuo¹, Tsubasa Hirakawa², Takayoshi Yamashita², Hujiyoshi Hironobu², Komei Sugiura¹ (1. Keio University, 2. Chubu University)

Keywords:Nearest Neighbor future captioning, DSRs, Vision and Language

The practical implementation of domestic support robots that can communicate using natural language is a promising solution for those in need of assistance. In particular, the ability to predict potential hazards associated with task execution and to prompt the user for judgment can enhance safety and convenience. However, accurate prediction is difficult because information about future events cannot be utilized. In existing methods, representation of the grasped object is insufficient because the image of the grasped object is not used as input. Additionally, there is a drawback that it is impossible to avoid collision during collision prediction as it requires input of the previous image. In this study, we propose the addition of an attention map visualization module for collision prediction and the enhancement of model representation through the use of k-nearest neighbor method. We conduct comparative experiments using standard evaluation metrics for generated text such as BLEU4, METEOR, ROUGE-L, and CIDEr-D. Experimental results show that the proposed method outperforms the baseline method in all evaluation metrics.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3G1-OS-24a] 日常生活知識とAI

[3G1-OS-24a-05] Generating Description for Possible Collisions in Object Placement Tasks

Password