2:20 PM - 2:40 PM
[1N1-04] Analysis of cognitive satisficing value function
Guaranteed satisficing and finite regret
Keywords:satisficing, bandit problems, cognitively inspired computing
As the domains of reinforcement learning become more complicated and realistic, standard optimization algorithms may not work well. In this paper we introduce a simple mathematical model called RS (reference satisficing) that implements a satisficing strategy that look for actions with values above the aspiration level. We apply it to K-armed bandit problems. If there are actions with values above the aspiration level, we theoretically show that RS is guaranteed to find these actions. Also, if the aspiration level is set to an ''optimal level'' so that satisficing practically ends up optimizing, we prove that the regret (the expected loss) is upper bounded by a finite value. We confirm these results by simulations, and clarify the effectiveness of RS through comparison with other algorithms.