5:40 PM - 6:00 PM
[2D4-02] Evaluation of Hybrid Reward Architecture on various learning policies and environments
Keywords:Game, Reinforcement Learning
Deep Q-Network (DQN) was able to achieve a level comparable to the performance of a professional human player.
However, in large and complex domains (e.g. Ms. Pacman), learning can be very slow and unstable.
In Hybrid Reward Architecture (HRA), a reward function is decomposed in advance to enhance learning in such
domains, and then value functions are separately learned for decomposed reward functions.
In this paper, we made some environments that made learning more difficult to evaluate the performance of HRA.
The results indicated that HRA need more enhancements to learn environments where learning is difficult on the uniform random policy.
However, in large and complex domains (e.g. Ms. Pacman), learning can be very slow and unstable.
In Hybrid Reward Architecture (HRA), a reward function is decomposed in advance to enhance learning in such
domains, and then value functions are separately learned for decomposed reward functions.
In this paper, we made some environments that made learning more difficult to evaluate the performance of HRA.
The results indicated that HRA need more enhancements to learn environments where learning is difficult on the uniform random policy.