JSAI2024

Presentation information

Poster Session

Poster session » Poster session

[3Xin2] Poster session 1

Thu. May 30, 2024 11:00 AM - 12:40 PM Room X (Event hall 1)

[3Xin2-88] Target-oriented Exploration in Neural Contextual Bandits

〇Shogo Ito1, Tatsuji Takahashi2, Yu Kono2 (1.Graduate School of Tokyo Denki University, 2.Tokyo Denki University)

Keywords:Reinforcement Learning, Contextual Bandits, Neural Contextual Bandits

Selection algorithms for advertising delivery and recommendation are an indispensable part of Web services. Contextual bandit algorithms are particularly useful to reflect human preferences in existing tasks in recommendation, with advantages such as real-time responsiveness and strength in cold starts. Their combination with reinforcement learning, such as ChatGPT's RLHF tuning, also can allow further adaptation to human preferences. However, in industrial applications, the emphasis is more on quick achievement of specific standards, rather than extensive exploratory environmental adaptation. We therefore focused on target-oriented achievement, which is a human decision-making tendency. A meta-policy that incorporates this tendency is Regional Linear Risk-sensitive Satisficing (RegLinRS). Tsuboya et al. have shown its high performance in environments with linear reward. It can also be expected to achieve high performance in environments with non-linear reward. We developed Neural Regional Risk-sensitive Satisficing (NeuralRegRS), an extension of RegLinRS for complex function approximation, and tested its performance on environments using both artificial and real-world datasets.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password