4:10 PM - 4:30 PM
[3R5-GS-2-03] Purposive Exploration and Progressive Reference Control
Keywords:reinforcement learning, machine learning , decision-making
Humans tend to attempt to achieve higher goals by gradually updating the objective level.Also ,Trial and error to achieve the goal is very quick.These allow for efficient, step-by-step optimization of procedures. The latter trial-and-error capability is supported by the Risk-sensitive Satisficing (RS) algorithm in the context of reinforcement learning. On the other hand, there is a lack of discussion on the step-by-step updating of the objective level in a framework that combines the former with the latter. The advantage of having an objective is that prior knowledge can be used to set the objective. In the case of animals, it corresponds to the search for food using calorie consumption as a minimum criterion, and in industrial applications, it corresponds to operational costs and numerical targets for investors. If the goal is achieved, the agent adjusts the target upward, and if it is unattainable, it adjusts the target downward. It is also very flexible, as it can change its goals based on hearsay information, such as when another agent has achieved a better performance record. In this study, we examine the joint goal search, RS, and the gradual modification of the goal level through simulations of the Bandit problem. We propose a natural form that efficiently optimizes behavior by having an initial objective level corresponding to a prior distribution based on prior knowledge and body structure.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.