Autonomous optimal exploration through satisficing

Yu Kono

6:20 PM - 6:40 PM

[1Z3-04] Autonomous optimal exploration through satisficing

Yu Kono², 〇Tatsuji Takahashi¹ (1. Tokyo Denki University, 2. DeNA Co., Ltd.)

Keywords:reinforcement learning, multi-armed bandit problems, satisficing, bounded rationality

As deep layered neural networks enables reinforcement learning in huge action-state spaces, the exploration--exploitation tradeoff becomes more serious. Several heuristics have been proposed to deal with the tradeoff utilizing noises. The probabilistic methods have difficulty in parameter tuning, and they amplify the problem of huge dispersion in performance of deep reinforcement learning algorithms. We propose a deterministic action selection algorithm based on a cognitive satisficing value function (RS) inspired by how humans explore under uncertainty. We define a method to enable optimal (minimal) exploration, utilizing the relationship between the aspiration level and the potential exploration distribution. The resulting algorithm exhibits an optimal performance in multi-armed bandit problems, and it opens the possibility for a new class of reinforcement learning algorithms.

Presentation information

[1Z3] [General Session] 2. Machine Learning

[1Z3-04] Autonomous optimal exploration through satisficing