JSAI2022

Presentation information

General Session

General Session » GS-2 Machine learning

[2C5-GS-2] Machine learning: reinforcement learning (2)

Wed. Jun 15, 2022 3:20 PM - 5:00 PM Room C (Room C-2)

座長:内部 英治(国際電気通信基礎技術研究所)[現地]

3:20 PM - 3:40 PM

[2C5-GS-2-01] Incorporating Future Trajectory Supervisions into Sequence Modeling based Reinforcement Learning

〇Terufumi Morishita1, Gaku Morio1, Hiroaki Ozaki1, Nobuo Nukaga1 (1. Hitachi, Ltd.)

Keywords:reinforcement learning, sequence modeling, transformer, game AI, control

Recent studies proposed re-formalizing reinforcement learning problems as sequence modeling problems (seq-RL), enabling us to utilize advanced technologies such as Transformer. They proposed predicting next actions from the past trajectories (i.e., sequences of stae, action and reward), as commonly done in typical sequence generation tasks. However, in reinforcement learning, many studies have suggested that planning over long-term future rather than immediate future is essential.
In this study, we consider long-term planning in seq-RL. Specifically, we generalize seq-RL to a multitask problem where we predict multiple elements on the future trajectory. To tackle this problem, we extend Transformer towards future using dummy input tokens, each of which represents an element at a specific timestep on a future trajectory. Our model aggregates information from past plus imaginary future trajectories through the self-attention to choose a next action consistent with a long-term plan. Our model outperforms the baseline on Atari and OpenAI-Gym tasks.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password