2:20 PM - 2:40 PM
[2M4-OS-19b-04] World Model Based Imitation Learning by KL divergence Minimization of State Action Prediction and State Estimation
Keywords:Imitation Learning, World model
In this paper, we propose an imitation learning method based on minimizing the KL divergence between the state transition prediction results of learned policy and the state estimation results of expert data. We use the Recurrent State Space Model (RSSM), a kind of world model for state estimation. RSSM has been used in PlaNet and Dreamer, which are state of the art deep reinforcement learning methods. We compared the learn on the MuJoCo simulation environment. From the experimental results, we found that the proposed method can obtain higher total rewards. Learning by this method enables imitation learning based on state transitions, rather than the direct imitation of actions.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.