Keywords:reinforcement learning, dynamics model, offline reinforcement learning, batch reinforcement learning
Offline reinforcement learning (offline RL) is a promising method when we cannot expect data from online interactions. Most offline RL algorithms rely on large datasets, and their training tends to be unstable when the size is small. Although model-based RL is a popular choice for enhancing sample efficiency in online RL, naively incorporating dynamics model to offline settings can lead to poor performance. We propose a novel algorithm in offline model-based RL, behavior regularized model-ensemble method, which learns policy from imaginary rollouts while regularizing the target policy with KL divergence from the estimated behavior policy. We show in continuous control tasks that our method can learn policies more stably even with smaller datasets.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.