JSAI2022

Presentation information

Organized Session

Organized Session » OS-19

[2M4-OS-19b] 世界モデルと知能(2/4)

Wed. Jun 15, 2022 1:20 PM - 2:40 PM Room M (Room B-2)

オーガナイザ:鈴木 雅大(東京大学)、岩澤 有祐(東京大学)[現地]、河野 慎(東京大学)、熊谷 亘(東京大学)、森 友亮(スクウェア・エニックス)、松尾 豊(東京大学)

2:20 PM - 2:40 PM

[2M4-OS-19b-04] World Model Based Imitation Learning by KL divergence Minimization of State Action Prediction and State Estimation

〇Katsuyoshi Maeyama1, Tadahiro Taniguchi1 (1. Ritsumeikan University)

Keywords:Imitation Learning, World model

In this paper, we propose an imitation learning method based on minimizing the KL divergence between the state transition prediction results of learned policy and the state estimation results of expert data. We use the Recurrent State Space Model (RSSM), a kind of world model for state estimation. RSSM has been used in PlaNet and Dreamer, which are state of the art deep reinforcement learning methods. We compared the learn on the MuJoCo simulation environment. From the experimental results, we found that the proposed method can obtain higher total rewards. Learning by this method enables imitation learning based on state transitions, rather than the direct imitation of actions.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password