JSAI2024

Presentation information

General Session

General Session » GS-2 Machine learning

[2B5-GS-2] Machine learning: Reinforcement learning

Wed. May 29, 2024 3:30 PM - 5:10 PM Room B (Concert hall)

座長:谷口 忠大(京都大学)

3:30 PM - 3:50 PM

[2B5-GS-2-01] Target-oriented Exploration by Random Network Distillation

Akane Tsuboya1, Tatsuji Takahashi1, 〇Yu Kono1 (1. Tokyo Denki University)

Keywords:reinforcement learning, deep learning, decision making, exploration

Deep Reinforcement Learning (DRL) has shown performance that equals or even surpasses humans in games such as Go and video games. However, the learning process for DRL agents requires a large amount of data, which means there is room for improvement in their exploration efficiency. Quick achievement of an aspiration level of performance is also an important goal, especially in industrial applications. Focusing on human cognitive characteristics, a tendency that prioritizes goal achievement, we have incorporated a method called Regional Stochastic Risk-sensitive Satisficing (RS2) into DRL. RS2 can calculate the agent's future exploration distribution drawing on reliability, a value that denotes the number of times that an action has been selected. However, in complex environments, counting the number of selections accurately is hard. This necessitates approximation of reliability through multiclass classification. We applied in this paper a method called Random Network Distillation (RND) to reliability. RND utilizes the prediction error of state transitions as a reward bonus for the agent's intrinsic motivation. This method has a problem that the agent's aspiration level of expected return changes. In this study, we overcame this problem through using RND indirectly for estimating reliability and combining it with RS2, and improved performance without changing the expected return.

Please log in with your participant account.
» Participant Log In