JSAI2024

Presentation information

General Session

General Session » GS-2 Machine learning

[2B5-GS-2] Machine learning: Reinforcement learning

Wed. May 29, 2024 3:30 PM - 5:10 PM Room B (Concert hall)

座長:谷口 忠大(京都大学)

4:30 PM - 4:50 PM

[2B5-GS-2-04] Incremental Improvement of Reward Function Using Trajectories for Better Performance on Reinforcement Learning

〇Kota Minoshima1, Sachiyo Arai1 (1. Chiba University)

Keywords:Reinforcement Learning, Inverse Reinforcement Learning, Reward Shaping

In order to acquire an appropriate control law through reinforcement learning, it is necessary to design an appropriate reward function.
However, this reward design becomes complicated for large-scale problems, increasing the design burden and inducing unintended behavior.
Therefore, when unintended behavior is observed in real-world applications of reinforcement learning
In real-world applications of reinforcement learning, when unintended behavior is identified, a method to improve the reward design based on this behavior may be required.
In order to identify the cause of unintended behavior, it is necessary to know what kind of reward the agent is getting by the current reward function.
One approach to this is inverse reinforcement learning, which estimates the expert's reward given the expert's trajectory.
By applying inverse reinforcement learning to the trajectory of a reinforcement learning agent, it is possible to know what kind of reward the agent is getting according to the current reward function.
In this study, we propose a method to improve the performance of reinforcement learning by estimating the reward of the reinforcement learning agent by inverse reinforcement learning and improving the reward design based on the estimated reward.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password