1:20 PM - 1:40 PM
[2C4-GS-2-01] Risk-sensitive Satisficing with local approximation of reliability
Keywords:reinforcement learning, machine learning, decision-making, contextual bandit, satisficing
Deep reinforcement learning has enabled to learn from complex input signals, which was a difficulty before, owing to the excellent approximation properties of neural networks. On the other hand, optimal action learning in finite time still poses challenges in environments as large and complex as the real world. This problem is caused by the huge number of sampling times required for data gathering for function approximation and the number of explorations for reinforcement learning. We focused on satisficing in an effort to combine the reduction of the number of explorations and the function approximation. Satisficing is a human decision-making tendency that is to explore with the aim of achieving. Linear Risk-sensitive Satisficing (LinRS) has been proposed, which is based on this satisficing and applied to the contextual bandit problem. However, LinRS has a problem in that the approximation of the past trial memory (reliability) is dull to the feature vectors, and the original concept of satisficing cannot be fully demonstrated. In this study, we proposed Regional LinRS, which uses episodic memory to approximate the memory in the temporal neighborhood, and showed its usefulness.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.