JSAI2023

Presentation information

Organized Session

Organized Session » OS-27

[2Q1-OS-27a] 強化学習の新展開

Wed. Jun 7, 2023 9:00 AM - 10:40 AM Room Q (601)

オーガナイザ:太田 宏之、甲野 佑、高橋 達二

9:20 AM - 9:40 AM

[2Q1-OS-27a-02] Offline Model-Based Imitation Learning with Entropy Regularization of Model and Policy

〇Eiji Uchibe1 (1. Advanced Telecommunications Research Institute International)

Keywords:Offline imitation learning, Model-Based Forward and Inverse Reinforcement Learning, Entropy Regularization

Model-Based Entropy-Regularized Imitation Learning (MB-ERIL) is an online model-based generative adversarial imitation learning method that introduces entropy regularization of policy and state transition model. Online-MB-ERIL learns the policy and model from expert data, learner's data, and generated data. Costly interactions with an actual environment are needed to obtain the first two datasets, while the policy and model quickly generate the last one. This report discusses an offline learning setting without using the second data obtained from the interaction between the policy and the actual environment. Next, we propose Offline-MB-ERIL, which introduces the idea of Positive and Unlabeled data learning. Given sub-optimal data, Offline-MB-ERIL can recover policy and model efficiently using them as unlabeled data. Through a vision-based arm-reaching task, we show that Offline-MB-ERIL can better use suboptimal data than Online-MB-ERIL.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password