2:00 PM - 2:20 PM
[2H3-J-2-03] Adaptability of Cognitive Satisficing Algorithm in Nonstationary Environments
Keywords:multi-armed bandit, satisficing, nonstationary
The environments where an agent performs trial-and-error learning is generally nonstationary because of unobservable information and various kinds of fluctuations. In order to make effective decisions in such an environment, the agent has to gradually or abruptly discard old information and put more weight on newer information, because some of the elements in the environment may have changed. As a result, there is a necessity of choosing a better option with smaller amount of information. We focus on the risk-sensitive satisficing (RS) algorithm which models the decision-making strategy of human beings and animals. We compare its performance in stationary and nonstationary bandit problems with other representative algorithms. We propose variants of RS combined with existing ideas for adaptation for nonstationary bandits such as meta-bandit and discounted update.