4:30 PM - 4:50 PM
[3R5-GS-2-04] Target-oriented Exploration to Adapt to Different Dataset Types
Keywords:Reinforcement Learning, Machine Learning, Contextual Bandit problems, Decision-making
Reinforcement learning is weak to real-world noise and difficult to adapt to the gap between simulation and reality. This problem is famous in motion control tasks and is also remarkably seen in contextual bandit problems used in recommendation systems. Contextual bandit problems require a linear approximation of the target feature, but some algorithms that perform well on artificial data may not be effective for noisy real-world data. Humans adapt dynamically to complex real-world environments with limited data sampling by prioritizing trial and error aimed at reaching a certain aspiration level, rather than optimization. Risk-sensitive Satisficing (RS) is a target-oriented algorithm that includes such human cognitive tendencies.In the contextual bandit problem, RS has been suggested to perform well not only on artificial data but also on real-world data. However, it was necessary to have a certain adoption weighting rate for a prior distribution as a parameter in fitting real-world data. In this study, we tested the possibility of quickly and flexibly adapting to a wider range of data By introducing a meta-algorithm that dynamically determines the adoption weighting rate.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.