Expansion to Gaussian Distributional Rewards in Natural Reinforcement Learning

Shoma Ogawa

2:30 PM - 2:50 PM

[2Q4-OS-27b-04] Expansion to Gaussian Distributional Rewards in Natural Reinforcement Learning

〇Shoma Ogawa¹, Shuichi Arimura², Tatsuji Takahashi¹, Yu Kohno¹ (1. School of Science and Engineering, Tokyo Denki University, 2. Graduate School of Tokyo Denki University)

Keywords:Reinforcement Learning, Machine Learning, Bandit Problem, Recommend

Reinforcement learning, a machine learning approach in which the agent leans behavior through interaction with the environment to maximize reward, has recently been actively studied and made great progress. In particular, bandit algorithms are widely used, for example, in the field of recommender systems including ad serving. But reward maximization in such fields can be difficult due to the complexity and non-stationarity of humans. In such cases, securing a certain level of reward, rather than simply keep aiming at maximization, can be more important. Algorithms in this approach concur with the property of human preferences too, and show excellent performance when the said level is chosen properly. Risk-sensitive Satisficing (RS) incorporates such cognitive tendencies into the search, and RS is a natural reinforcement learning algorithm that aims to achieve a desired level of performance according to a set objective. Although it shows excellent performance in natural reinforcement learning, such as the Bernoulli distribution reward used to determine whether a user clicked on an advertisement or a product, in practical applications, the Bandit problem often deals with continuous-valued rewards such as viewing time. In this study, we examine the performance of RS when applied to the bandit problem with real-valued rewards from a normal distribution, providing some considerations.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2Q4-OS-27b] 強化学習の新展開

[2Q4-OS-27b-04] Expansion to Gaussian Distributional Rewards in Natural Reinforcement Learning

Password