10:20 AM - 10:40 AM
[3G1-OS-24a-05] Generating Description for Possible Collisions in Object Placement Tasks
Keywords:Nearest Neighbor future captioning, DSRs, Vision and Language
The practical implementation of domestic support robots that can communicate using natural language is a promising solution for those in need of assistance. In particular, the ability to predict potential hazards associated with task execution and to prompt the user for judgment can enhance safety and convenience. However, accurate prediction is difficult because information about future events cannot be utilized. In existing methods, representation of the grasped object is insufficient because the image of the grasped object is not used as input. Additionally, there is a drawback that it is impossible to avoid collision during collision prediction as it requires input of the previous image. In this study, we propose the addition of an attention map visualization module for collision prediction and the enhancement of model representation through the use of k-nearest neighbor method. We conduct comparative experiments using standard evaluation metrics for generated text such as BLEU4, METEOR, ROUGE-L, and CIDEr-D. Experimental results show that the proposed method outperforms the baseline method in all evaluation metrics.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.