5:20 PM - 5:40 PM
[2Q5-J-2-01] Bayesian Inverse Reinforcement Learning for Demonstrations of an Expert in Multiple Dynamics
Keywords:Inverse Reinforcement Learning, Reinforcement Learning, Bayesian Inference
Reinforcement Learning has numerous achievements, but it requires a careful specification of a reward function that represents the objective of a problem.
There are problems whose objectives are difficult to represent as a function and are easier to give experts' demonstrations. For these problems, Inverse Reinforcement Learning is useful because it estimates a reward function from expert's demonstrations. Most of existing Inverse Reinforcement Learning methods assume that an expert gives demonstrations in a fixed environment, but an expert can provide demonstrations for a specific objective in multiple environments. For example, it is difficult to represent objective for car driving, and the driver can give demonstrations under multiple situations. In such cases, it is natural that we use demonstrations in multiple environments to estimate the rewards of the expert. We formulated and proposed an algorithm for this problem based on Bayesian Inverse Reinforcement Learning. Experimental results show that the proposed method quantitatively overperforms the existing method.
There are problems whose objectives are difficult to represent as a function and are easier to give experts' demonstrations. For these problems, Inverse Reinforcement Learning is useful because it estimates a reward function from expert's demonstrations. Most of existing Inverse Reinforcement Learning methods assume that an expert gives demonstrations in a fixed environment, but an expert can provide demonstrations for a specific objective in multiple environments. For example, it is difficult to represent objective for car driving, and the driver can give demonstrations under multiple situations. In such cases, it is natural that we use demonstrations in multiple environments to estimate the rewards of the expert. We formulated and proposed an algorithm for this problem based on Bayesian Inverse Reinforcement Learning. Experimental results show that the proposed method quantitatively overperforms the existing method.