Dual Reinforcement Learning for Satisficing Levels in Target-Oriented Exploration

Wataru Nakamura; Tatsuji Takahashi; Yu Kono

[3Win5-07] Dual Reinforcement Learning for Satisficing Levels in Target-Oriented Exploration

〇Wataru Nakamura¹, Tatsuji Takahashi¹, Yu Kono¹ (1.Tokyo Denki University)

Keywords:Reinforcement Learning, Cognitive Science, Machine Learning

When humans begin a new endeavor, they initially focus on acquiring basic skills and progressively advance to intermediate and advanced levels.
In essence, the focus is on achieving a goal rather than optimizing from the outset.
Based on this idea, we decompose reinforcement learning into two processes: goal-oriented exploration and stepwise goal adjustment.
Our algorithm, Risk-sensitive Satisficing (RS), quickly achieves satisficing by minimizing a subjective regret defined by the goal.
RS also dynamically optimizes the goal in bandit problems, matching Thompson Sampling performance without requiring prior knowledge.
While this demonstrates the usefulness of decomposing reinforcement learning into two key elements, current RS goal adjustment methods remain limited to bandit problems.
In this study, we propose a general goal adjustment algorithm based on reinforcement learning for motor control.
By integrating two simple reinforcement learning processes - rapid goal attainment and one-dimensional goal optimization - we successfully operationalize the concept of a goal.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3Win5] Poster session 3

[3Win5-07] Dual Reinforcement Learning for Satisficing Levels in Target-Oriented Exploration

Password