[1Win4-51] Real-Time Task Success Prediction for Open-Vocabulary Manipulation Based on Muli-Level Aligned Representation
Keywords:Task Success Prediction, Open-Vocabulary Manipulation, Multi-Level Aligned Visual Representation
Task success prediction is important for open-vocabulary object manipulation tasks by manipulators, because it can enhance the reliability and efficiency of manipulations. In particular, it is highly convenient for efficient task execution if success prediction can be performed on-the-fly during the object manipulation. We propose a framework that extends Contrastive λ-Repformer, which determines task success based on images before and after open-vocabulary object manipulation and a instruction, to real-time task success prediction for open-vocabulary manipulation. Our framework detects the timing of successful object manipulation by focusing on minute changes between images, which is conducted by contrasting multi-level aligned representation for images in the initial state and images at any given time. Experimental results confirm that real-time success prediction could be performed using this framework.
Please log in with your participant account.
» Participant Log In