JSAI2018

Presentation information

Poster presentation

General Session » Interactive

[3Pin1] インタラクティブ(1)

Thu. Jun 7, 2018 9:00 AM - 10:40 AM Room P (4F Emerald Lobby)

9:00 AM - 10:40 AM

[3Pin1-05] Batch Reinforcement Learning for Linearly Solvable MDP

〇Tomoki Nishi1, Keisuke Otaki1, Takayoshi Yoshimura1 (1. Toyota Central R&D Labs., Inc.)

Keywords:Reinforcement learning, Linearly solvable Markov decision process

Linearly solvable Markov decision process (L-MDP) is an essential subclass of MDP to find a better policy efficiently. We first develop a novel batch reinforcement learning algorithm for L-MDP in discretized action space. The algorithm simultaneously learns a state value function and a predictor of state values at next step by using pre-collected data. We evaluate our method on traffic signal control domain in a single intersection with the traffic simulator SUMO. Our experiment demonstrates that our method finds the policy on the domain efficiently.