JSAI2022

Presentation information

General Session

General Session » GS-2 Machine learning

[2C5-GS-2] Machine learning: reinforcement learning (2)

Wed. Jun 15, 2022 3:20 PM - 5:00 PM Room C (Room C-2)

座長:内部 英治(国際電気通信基礎技術研究所)[現地]

4:00 PM - 4:20 PM

[2C5-GS-2-03] Max-Min Off-Policy Actor-Critic with Robustness to Model Misspecification

〇Takumi Tanabe1,2, Rei Sato1,2, Kazuto Fukuchi1,2, Jun Sakuma1,2, Youhei Akimoto1,2 (1. University of Tsukuba, 2. RIKEN AIP)

Keywords:Reinforcement Learning, sim2real, maxmin optimization

In reinforcement learning, since it is costly and risky to training policies in the real-world, policies trained in a simulation environment are often transferred to the real-world.
However, because the simulation environment does not perfectly mimic the real-world environment, modeling errors may occur.
We focus on scenarios where a simulation environment including an uncertainty parameter and a set of its possible values are available.
The objective is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment, provided that it is included in the uncertainty parameter set.
We propose the Max-min Twin Delayed Deep Deterministic Policy Gradient Algorithm (M2TD3) and its soft variant (SoftM2TD3) to solve the max-min optimization problem in order to obtain a policy that optimizes the worst-case performance.
Experiments in the MuJoCo environments show that the proposed method exhibited better worst-case performance than some baseline approaches.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password