10:30 AM - 12:10 PM
[3Rin2-07] Multi-armed bandit algorithm applicable to stationary and non-stationary environment using self-organizing maps
Keywords:Multi-armed bandit problem, Self-organizing maps
A communication robots aiming to satisfy the users facing them needs to take appropriate behavior more rapidly. However, user requests often change while these robots are determining the most appropriate behavior for these users. Therefore, it is difficult for robots to derive an appropriate behavior. Such problems are formulated as a multi-armed bandit problem. To solve this problem, we proposed a multi-armed bandit algorithm capable of adaptation to stationary and non-stationary environments using self-organizing map. In this study, numerous experiments were conducted considering a stochastic multi-armed bandit problem in both stationary and non-stationary environments. Consequently, the proposed algorithm demonstrated equivalent or improved effectiveness in stationary environments with numerous arms and consistently strong capability in non-stationary environments regardless of the number of arms in contrast with existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.