JSAI2023

Presentation information

General Session

General Session » GS-2 Machine learning

[3R5-GS-2] Machine learning

Thu. Jun 8, 2023 3:30 PM - 5:10 PM Room R (602)

座長:漥澤 駿平(NEC) [オンライン]

4:30 PM - 4:50 PM

[3R5-GS-2-04] Target-oriented Exploration to Adapt to Different Dataset Types

〇Sakura Mizuno1, Shogo Ito1, Akane Tsuboya2, Tatsuji Takahashi1, Yu Kono1 (1. Tokyo Denki University, 2. Graduate School of Tokyo Denki University)

Keywords:Reinforcement Learning, Machine Learning, Contextual Bandit problems, Decision-making

Reinforcement learning is weak to real-world noise and difficult to adapt to the gap between simulation and reality. This problem is famous in motion control tasks and is also remarkably seen in contextual bandit problems used in recommendation systems. Contextual bandit problems require a linear approximation of the target feature, but some algorithms that perform well on artificial data may not be effective for noisy real-world data. Humans adapt dynamically to complex real-world environments with limited data sampling by prioritizing trial and error aimed at reaching a certain aspiration level, rather than optimization. Risk-sensitive Satisficing (RS) is a target-oriented algorithm that includes such human cognitive tendencies.In the contextual bandit problem, RS has been suggested to perform well not only on artificial data but also on real-world data. However, it was necessary to have a certain adoption weighting rate for a prior distribution as a parameter in fitting real-world data. In this study, we tested the possibility of quickly and flexibly adapting to a wider range of data By introducing a meta-algorithm that dynamically determines the adoption weighting rate.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password