Presentation information

General Session

General Session » GS-2 Machine learning

[1G2-GS-2a] 機械学習:強化学習

Tue. Jun 8, 2021 1:20 PM - 3:00 PM Room G (GS room 2)

座長:市川 嘉裕(奈良工業高等専門学校)

2:20 PM - 2:40 PM

[1G2-GS-2a-04] Simulation study of Stochastic Risk-sensitive Satisificing policy which is based on non-satisfaction equilibrium

〇Toshikatsu Kato1, Yu Kohno2, Tatsuji Takahashi2 (1. Graduate School of Tokyo Denki University, 2. School of Science and Engineering, Tokyo Denki University)

Keywords:Reinforcement learning, Machine learning, Bandit Problem, Satisficing

We humans tend to search for a satisfiable action above an acceptability threshold (satisficing). A value function that implements satisficing together with the prospect theory-like risk attitudes called “risk-sensitive satisficing” (RS) model shows superior results in the bandit problems. However, wider application and analysis of the behavior of the model is intractable in some ways, because of the deterministic nature of the policy. In this study, we introduce the stochastic version of RS (SRS). Through comparison of RS and SRS in stationary and non-stationary environments, we show the merits of SRS.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.