3:30 PM - 3:50 PM
[2B5-GS-2-01] Target-oriented Exploration by Random Network Distillation
Keywords:reinforcement learning, deep learning, decision making, exploration
Deep Reinforcement Learning (DRL) has shown performance that equals or even surpasses humans in games such as Go and video games. However, the learning process for DRL agents requires a large amount of data, which means there is room for improvement in their exploration efficiency. Quick achievement of an aspiration level of performance is also an important goal, especially in industrial applications. Focusing on human cognitive characteristics, a tendency that prioritizes goal achievement, we have incorporated a method called Regional Stochastic Risk-sensitive Satisficing (RS2) into DRL. RS2 can calculate the agent's future exploration distribution drawing on reliability, a value that denotes the number of times that an action has been selected. However, in complex environments, counting the number of selections accurately is hard. This necessitates approximation of reliability through multiclass classification. We applied in this paper a method called Random Network Distillation (RND) to reliability. RND utilizes the prediction error of state transitions as a reward bonus for the agent's intrinsic motivation. This method has a problem that the agent's aspiration level of expected return changes. In this study, we overcame this problem through using RND indirectly for estimating reliability and combining it with RS2, and improved performance without changing the expected return.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.