9:00 AM - 10:40 AM
[3Pin1-05] Batch Reinforcement Learning for Linearly Solvable MDP
Keywords:Reinforcement learning, Linearly solvable Markov decision process
Linearly solvable Markov decision process (L-MDP) is an essential subclass of MDP to find a better policy efficiently. We first develop a novel batch reinforcement learning algorithm for L-MDP in discretized action space. The algorithm simultaneously learns a state value function and a predictor of state values at next step by using pre-collected data. We evaluate our method on traffic signal control domain in a single intersection with the traffic simulator SUMO. Our experiment demonstrates that our method finds the policy on the domain efficiently.