JSAI2018

Presentation information

Poster presentation

General Session » Interactive

[3Pin1] インタラクティブ(1)

Thu. Jun 7, 2018 9:00 AM - 10:40 AM Room P (4F Emerald Lobby)

9:00 AM - 10:40 AM

[3Pin1-11] Objective Correction for Policy Improvement under Entropy Regularization

〇Ryo Iwaki1, Minoru Asada1 (1. Osaka University)

Keywords:reinforcement learning, off-policy, entropy regularization

Reinforcement learning aims to find a policy which maximizes long term future reward by interacting with unknown environment through trial and error. In this study, we propose an objective correction method for entropy regularized Markov decision process. After deriving a policy gradient under the regularization by the entropy and relative entropy, we propose an on-policy objective correction method for off-policy policy improvement under entropy regularization.