Task-Conditional Generative Adversarial Imitation Learning That Infers Multiple Reward Functions

Kyoichiro Kobayashi

2:20 PM - 2:40 PM

[4I3-J-2-02] Task-Conditional Generative Adversarial Imitation Learning That Infers Multiple Reward Functions

〇Kyoichiro Kobayashi¹, Takato Horii^2,3, Ryo Iwaki¹, Yukie Nagai³, Minoru Asada¹ (1. Osaka University, 2. The University of Electro-Communications, 3. National Institute of Information and Communications Technology)

Keywords:Imitation Learning , Reinforcement Learning, Inverse Reinforcement Learning, Deep Learning

In this work, we propose a new framework of imitation learning that is designed to infer the multiple reward func-

tions. We introduce latent variables to discriminator and generator in Generative Adversarial Imitation Learning

(GAIL) to learn different reward functions and policies for different tasks. In order to control the balance between

imitate expert directly (early convergence) and to enhance variance of policy (sample various data and learning

robust reward), we introduce entropy regularized correction term in generator's objective function. We guarantee

that the objective function has the unique optimal solution by the same discussion as GAIL. In the experiment at

the grid world problem, we show that our framework can infer multiple reward functions and policies that represent

different tasks efficiently.

Presentation information

[4I3-J-2] Machine learning: analysis and buliding of basic models

[4I3-J-2-02] Task-Conditional Generative Adversarial Imitation Learning That Infers Multiple Reward Functions