9:00 AM - 10:40 AM
[3Pin1-11] Objective Correction for Policy Improvement under Entropy Regularization
Keywords:reinforcement learning, off-policy, entropy regularization
Reinforcement learning aims to find a policy which maximizes long term future reward by interacting with unknown environment through trial and error. In this study, we propose an objective correction method for entropy regularized Markov decision process. After deriving a policy gradient under the regularization by the entropy and relative entropy, we propose an on-policy objective correction method for off-policy policy improvement under entropy regularization.