Evaluation of Hybrid Reward Architecture on various learning policies and environments

Yutaro Fujimura

5:40 PM - 6:00 PM

[2D4-02] Evaluation of Hybrid Reward Architecture on various learning policies and environments

〇Yutaro Fujimura¹, Tomoyuki Kaneko¹ (1. The University of Tokyo)

Keywords:Game, Reinforcement Learning

Deep Q-Network (DQN) was able to achieve a level comparable to the performance of a professional human player.
However, in large and complex domains (e.g. Ms. Pacman), learning can be very slow and unstable.
In Hybrid Reward Architecture (HRA), a reward function is decomposed in advance to enhance learning in such
domains, and then value functions are separately learned for decomposed reward functions.
In this paper, we made some environments that made learning more difficult to evaluate the performance of HRA.
The results indicated that HRA need more enhancements to learn environments where learning is difficult on the uniform random policy.

Presentation information

[2D4] [General Session] 13. AI Application

[2D4-02] Evaluation of Hybrid Reward Architecture on various learning policies and environments