2:20 PM - 2:40 PM
[1B3-OS-41a-03] 6D Multi-View NewtonianVAE: A World Model-Based Approach for 6D Pose Estimation and Control
Keywords:6D control, World model, Visual feedback control, Multi-view image information
In this study, we propose a method for learning a latent space representing 6D poses and performing 6D control using NewtonianVAE. NewtonianVAE, as a type of world model, learns the dynamics of the environment as a latent space from observational data and performs proportional control based on the estimated position. By using NewtonianVAE, position estimation can be achieved based on the internal dynamics of the environment rather than an external coordinate system. While previous studies have applied Newtonian VAE to translational control, 6D control has not been investigated. To address this, we propose 6D Multi-View NewtonianVAE (6D-MNVAE), which extends the latent space by incorporating rotation vector. In our experiments, we evaluated whether 6D-MNVAE can estimate 6D poses in the latent space and perform 6D control towards a target pose. Experimental results showed that 6D-MNVAE achieved 6D control with an accuracy within 7 mm and 0.02 rad. Furthermore, our method does not require feature engineering or annotation and enables 6D control using only RGB image information.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.