9:20 AM - 9:40 AM
[2S1-GS-2-02] Utilizing Classified Trajectories in the Process of Reinforcement Learning to Improve Reward Function
Keywords:Reinforcement Learning, Reward Shaping, Imitation Learning
For reinforcement learning to acquire an appropriate policy, the designer must predate a properly designed reward function. However, in complex problem settings, the burden of designing such a reward function increases significantly. An improperly designed reward function can lead the agent to learn policies that deviate from the designer’s intent, becoming a bottleneck for applying reinforcement learning in real-world scenarios.In this study, we propose an approach to address this challenge by labeling the trajectories that the reinforcement learning agent transitions through during the learning process as successes or failures. We train a discriminator in parallel with the reinforcement learning agent to distinguish between these labeled trajectories and use its output as an additional reward. The discriminator outputs the probability of a given state being labeled as successful based on the states encountered by the agent during its interaction with the environment. By feeding this output back to the agent as an additional reward, we aim to reduce the burden of reward design while enabling more efficient learning.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.