Cognitive Satisficing Exploration in Dueling Bandit Problems

Kuratomo Oyo

1:40 PM - 2:00 PM

[1G2-GS-2a-02] Cognitive Satisficing Exploration in Dueling Bandit Problems

〇Kuratomo Oyo¹, Takuma Wada², Takumi Kamiya², Tatsuji Takahashi^2,3 (1. Kwansei Gakuin University, 2. Tokyo Denki University, 3. RIKEN Center for Advanced Intelligence Project)

Keywords:Machine learning, Reinforcement learning, Dueling bandit problem, Satisficing

Multi-armed bandit problems, which are the most fundamental tasks in reinforcement learning, have been widely applied to a range of problems such as online advertisement delivery and game tree search. In contrast to these traditional bandit problems that require absolute rewards to be quantifiable, dueling bandit problems (DBP) can deal with relative rewards by pairwise comparisons. In DBP, one of the most effective solutions is Double Thompson Sampling (D-TS). However, due to the pairwise comparisons, solving DBP requires many trials and errors, and that causes D-TS to do a lot of computation. In this paper, we focus on the fact that “satisficing” action selection leads to quick search for an action that satisfies a certain target level. We propose an algorithm that is based on Risk-sensitive Satisficing (RS) model. The result showed that there are some datasets on which its performance was inferior to D-TS’s. However, we propose a new method combining RS and T-DS that improves the performance for weak regret in DBP.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1G2-GS-2a] 機械学習：強化学習

[1G2-GS-2a-02] Cognitive Satisficing Exploration in Dueling Bandit Problems

Password