3:20 PM - 3:40 PM
[1G4-OS-21a-02] Construction and Validation of Action-Conditioned VideoGPT
Keywords:World Models, Conditioned video prediction
World models acquire external structure based on observations of the external world and can predict the future states of the external world as it changes with the action of the agent. Recent advances in generative models and language models have contributed to multi-modal world models, which are expected to be applied in various domains, including automated driving and robotics. Video prediction is the field that has made progress in terms of high fidelity and long term prediction, and world models have potential applications for acquiring temporal representations. One example of model architecture that has performed well is a combination of Encoder-Decoder based latent variable model for image reconstruction and auto-regressive model for prediction of latent sequence. In this work, we extend a video prediction model called VideoGPT, which uses VQVAE and Image-GPT by introducing action conditioning. Validation with CARLA and RoboNet showed improved performance compared to the model without conditioning.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.