JSAI2023

Presentation information

General Session

General Session » GS-2 Machine learning

[3R5-GS-2] Machine learning

Thu. Jun 8, 2023 3:30 PM - 5:10 PM Room R (602)

座長:漥澤 駿平(NEC) [オンライン]

3:50 PM - 4:10 PM

[3R5-GS-2-02] Target-oriented Exploration in Non-stationary Contextual Bandits

〇Shogo Ito1, Sakura Mizuno1, Akane Tsuboya2, Tatsuji Takahashi1, Yu Kono1 (1. Tokyo Denki University, 2. Graduate School of Tokyo Denki University)

Keywords:Reinforcement Learning, Contextual Bandit, Non-stationary Environment

Selection algorithms for ad delivery and recommender systems have become an indispensable part of Web services. Since the tastes and preferences of people are fluid, to be able to follow them in non-stationary environments is important for those algorithms.
We focused on a human decision-making tendency, that is, the tendency to give greater importance to achieving some goal rather than achieving optimization. Agents with this target-oriented tendency are expected to make flexible and highly followable decisions, because they explore according to the degree of achievement without being too sensitive to the changes in the environment. Risk-sensitive Satisficing (RS) is a meta-policy that incorporates target-oriented decision making. Hanayasu et al. showed that RS has excellent followability in non-stationary environments. However, it has not been verified whether it keeps similar followability in non-stationary environments in contextual bandit problems. We used Regional Linear Risk-sensitive Satisficing (RegLinRS), which is an extension of RS to approximate functions, to verify the followability in the environment, and showed the usefulness of RegLinRS.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password