2:20 PM - 2:40 PM
[1G2-GS-2a-04] Simulation study of Stochastic Risk-sensitive Satisificing policy which is based on non-satisfaction equilibrium
Keywords:Reinforcement learning, Machine learning, Bandit Problem, Satisficing
We humans tend to search for a satisfiable action above an acceptability threshold (satisficing). A value function that implements satisficing together with the prospect theory-like risk attitudes called “risk-sensitive satisficing” (RS) model shows superior results in the bandit problems. However, wider application and analysis of the behavior of the model is intractable in some ways, because of the deterministic nature of the policy. In this study, we introduce the stochastic version of RS (SRS). Through comparison of RS and SRS in stationary and non-stationary environments, we show the merits of SRS.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.