Proposal of direct policy search using reduction of policy parameters by principal component analysis

Yuuki Murata

9:00 AM - 10:40 AM

[3Pin1-08] Proposal of direct policy search using reduction of policy parameters by principal component analysis

〇Yuuki Murata¹, Megumi Miyashita¹, Shiro Yano¹, Toshiyuki Kondo¹ (1. Tokyo University of Agriculture and Technology)

Keywords:Reinforcement Learning, Mirror Descent, Principal Component Analysis

In the sampling based direct policy search in reinforcement learning, higher dimensional decision variables causes the deterioration of optimal value and the slowing down of the learning speed. We clarified that the variance of the sampling probability distribution affects both for the optimal value and the learning speed. Especially, there exists the tradeoff between the optimal value and the learning speed. In this paper, we propose two trick to improve the learning speed without deteriorating the optimal value. First trick is to employ the small variance sampling distribution for improving the optimal value; It causes slower convergence as a side effect.
As the second trick, we employed the dimensionality reduction of the decision variable for improving the learning speed.

Presentation information

[3Pin1] インタラクティブ(1)

[3Pin1-08] Proposal of direct policy search using reduction of policy parameters by principal component analysis