JSAI2019

Presentation information

Interactive Session

[3Rin2] Interactive Session 1

Thu. Jun 6, 2019 10:30 AM - 12:10 PM Room R (Center area of 1F Exhibition hall)

10:30 AM - 12:10 PM

[3Rin2-07] Multi-armed bandit algorithm applicable to stationary and non-stationary environment using self-organizing maps

〇Nobuhito Manome1,2, Shuji Shinohara2, Kouta Suzuki1,2, Kosuke Tomonaga1,2, Shunji Mitsuyoshi2 (1. SoftBank Robotics Corp., 2. Graduate School of Engineering, The University of Tokyo)

Keywords:Multi-armed bandit problem, Self-organizing maps

A communication robots aiming to satisfy the users facing them needs to take appropriate behavior more rapidly. However, user requests often change while these robots are determining the most appropriate behavior for these users. Therefore, it is difficult for robots to derive an appropriate behavior. Such problems are formulated as a multi-armed bandit problem. To solve this problem, we proposed a multi-armed bandit algorithm capable of adaptation to stationary and non-stationary environments using self-organizing map. In this study, numerous experiments were conducted considering a stochastic multi-armed bandit problem in both stationary and non-stationary environments. Consequently, the proposed algorithm demonstrated equivalent or improved effectiveness in stationary environments with numerous arms and consistently strong capability in non-stationary environments regardless of the number of arms in contrast with existing UCB1, UCB1-Tuned, and Thompson Sampling algorithms.