JSAI2023

Presentation information

Organized Session

Organized Session » OS-21

[1G4-OS-21a] 世界モデルと知能

Tue. Jun 6, 2023 3:00 PM - 4:40 PM Room G (A4)

オーガナイザ:鈴木 雅大、岩澤 有祐、河野 慎、熊谷 亘、松嶋 達也、森 友亮、松尾 豊

3:20 PM - 3:40 PM

[1G4-OS-21a-02] Construction and Validation of Action-Conditioned VideoGPT

〇Koudai Tabata1,6, Junnosuke Kamohara2,6, Ryosuke Unno1,6, Makoto Sato3,6, Taiju Watanabe4,6, Taiga Kume5,6, Masahiro Negishi1,6, Ryo Okada1,6, Yusuke Iwasawa1, Yutaka Matsuo1 (1. The University of Tokyo, 2. Tohoku University, 3. Nara Institute of Science and Technology, 4. Waseda University, 5. Keio University, 6. Matsuo Institute)

Keywords:World Models, Conditioned video prediction

World models acquire external structure based on observations of the external world and can predict the future states of the external world as it changes with the action of the agent. Recent advances in generative models and language models have contributed to multi-modal world models, which are expected to be applied in various domains, including automated driving and robotics. Video prediction is the field that has made progress in terms of high fidelity and long term prediction, and world models have potential applications for acquiring temporal representations. One example of model architecture that has performed well is a combination of Encoder-Decoder based latent variable model for image reconstruction and auto-regressive model for prediction of latent sequence. In this work, we extend a video prediction model called VideoGPT, which uses VQVAE and Image-GPT by introducing action conditioning. Validation with CARLA and RoboNet showed improved performance compared to the model without conditioning.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password