Incorporating Future Trajectory Supervisions into Sequence Modeling based Reinforcement Learning

Terufumi Morishita

3:20 PM - 3:40 PM

[2C5-GS-2-01] Incorporating Future Trajectory Supervisions into Sequence Modeling based Reinforcement Learning

〇Terufumi Morishita¹, Gaku Morio¹, Hiroaki Ozaki¹, Nobuo Nukaga¹ (1. Hitachi, Ltd.)

Keywords:reinforcement learning, sequence modeling, transformer, game AI, control

Recent studies proposed re-formalizing reinforcement learning problems as sequence modeling problems (seq-RL), enabling us to utilize advanced technologies such as Transformer. They proposed predicting next actions from the past trajectories (i.e., sequences of stae, action and reward), as commonly done in typical sequence generation tasks. However, in reinforcement learning, many studies have suggested that planning over long-term future rather than immediate future is essential.
In this study, we consider long-term planning in seq-RL. Specifically, we generalize seq-RL to a multitask problem where we predict multiple elements on the future trajectory. To tackle this problem, we extend Transformer towards future using dummy input tokens, each of which represents an element at a specific timestep on a future trajectory. Our model aggregates information from past plus imaginary future trajectories through the self-attention to choose a next action consistent with a long-term plan. Our model outperforms the baseline on Atari and OpenAI-Gym tasks.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2C5-GS-2] Machine learning: reinforcement learning (2)

[2C5-GS-2-01] Incorporating Future Trajectory Supervisions into Sequence Modeling based Reinforcement Learning

Password