1:40 PM - 2:00 PM
[2C4-GS-2-02] Stochastic Risk-sensitive Satisificing policy with ideal aspiration level
Keywords:Reinforcement Learning, Machine Learning, Bandit Problem, Satisficing
We humans tend to search for a satisfiable action above an acceptability threshold (satisficing). A value function that implements satisficing together with the prospect theory-like risk attitudes called “risk-sensitive satisficing” (RS) model shows superior results in the bandit problems. Several dynamic reference value estimation methods have also been devised to obtain high performance. One of them is Stochastic Risk-sensitive Satisficing (SRS), which estimates the cyclic search allocation ratio contained in the RS as a probability distribution. SRS has the same search based on the order relation of values as Softmax, but it seems to be more robust to the modification of values due to uncertainty and non-stationarity in the search process. In this study, we consider the vulnerability of SRS to rounding errors and show its excellent performance and properties in non-stationary bandit problems.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.