Presentation information

Organized Session

Organized Session » OS-18

[2D5-OS-18b] OS-18 (2)

Wed. Jun 10, 2020 3:50 PM - 5:30 PM Room D (jsai2020online-4)

岩澤 有祐(東京大学)、鈴木 雅大(東京大学)、山川 宏(東京大学/全脳アーキテクチャ・イニシアティブ)、松尾 豊(東京大学)

4:30 PM - 4:50 PM

[2D5-OS-18b-03] Offline Model-based Reinforcement Learning

〇Tatsuya Matsushima1, Hiroki Furuta1, Shixiang Gu1,2, Yutaka Matsuo1 (1. The University of Tokyo, 2. Google AI)

Keywords:reinforcement learning, dynamics model, offline reinforcement learning, batch reinforcement learning

Offline reinforcement learning (offline RL) is a promising method when we cannot expect data from online interactions. Most offline RL algorithms rely on large datasets, and their training tends to be unstable when the size is small. Although model-based RL is a popular choice for enhancing sample efficiency in online RL, naively incorporating dynamics model to offline settings can lead to poor performance. We propose a novel algorithm in offline model-based RL, behavior regularized model-ensemble method, which learns policy from imaginary rollouts while regularizing the target policy with KL divergence from the estimated behavior policy. We show in continuous control tasks that our method can learn policies more stably even with smaller datasets.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.