JSAI2025

Presentation information

Poster Session

Poster session » Poster Session

[1Win4] Poster session 1

Tue. May 27, 2025 3:30 PM - 5:30 PM Room W (Event hall D-E)

[1Win4-51] Real-Time Task Success Prediction for Open-Vocabulary Manipulation Based on Muli-Level Aligned Representation

〇Miyu Goko1, Motonari Kambara1, Daichi Saito1, Seitaro Otsuki1, Komei Sugiura1 (1. Keio University)

Keywords:Task Success Prediction, Open-Vocabulary Manipulation, Multi-Level Aligned Visual Representation

Task success prediction is important for open-vocabulary object manipulation tasks by manipulators, because it can enhance the reliability and efficiency of manipulations. In particular, it is highly convenient for efficient task execution if success prediction can be performed on-the-fly during the object manipulation. We propose a framework that extends Contrastive λ-Repformer, which determines task success based on images before and after open-vocabulary object manipulation and a instruction, to real-time task success prediction for open-vocabulary manipulation. Our framework detects the timing of successful object manipulation by focusing on minute changes between images, which is conducted by contrasting multi-level aligned representation for images in the initial state and images at any given time. Experimental results confirm that real-time success prediction could be performed using this framework.

Please log in with your participant account.
» Participant Log In