Action Conditional Video Prediction with Hierarchically Represented State Space Model

Naruya Kondo; Yusuke Iwasawa; Yutaka Matsuo

[4Rin1-48] Action Conditional Video Prediction with Hierarchically Represented State Space Model

〇Naruya Kondo¹, Yusuke Iwasawa¹, Yutaka Matsuo¹ (1.The University of Tokyo)

Keywords:State Representation Learning

Deep State Space Models (SSMs) are often used in Reinforcement Learning to model the dynamics of environments. However, when using SSMs for complex data, SSMs learning may not proceed well even if the state dimensionality is increased in an attempt to capture highly complex dynamics. This problem seems to occur because there is not enough information to accurately infer the transitions of high-dimensional state variables, making it difficult to bring prior and posterior distributions closer during model learning. In this study, we propose hierarchically represented SSMs. When transitioning high-dimensional state variables, the proposed model uses a low-dimensional state representation which is learned in advance by a separate, smaller SSM with low-dimensional state variables. This allows the SSMs to bootstrap central state representations to learn more detailed state representations, to learn transitions of high-dimensional state variables with the aid of low-dimensional state representations, and to obtain richer state representations. In evaluation experiments, we perform action conditional video prediction with BAIR Push Dataset and show the effectiveness of our approach.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4Rin1] Interactive 2

[4Rin1-48] Action Conditional Video Prediction with Hierarchically Represented State Space Model

Password