A study on measures in multi-armed bandit problem with hidden state.

Kouhei Kudo

10:30 AM - 12:10 PM

[3Rin2-05] A study on measures in multi-armed bandit problem with hidden state.

〇Kouhei Kudo¹, Takashi Takekawa¹ (1. Kogakuin University)

Keywords:Bandit problem, Reinforcement learning

The Bandit problem is a matter of maximizing the current reward by selecting one out of the options and acquiring the reward, while limiting it to one state. Reinforcement learning is a problem of maximizing rewards earned in the future by performing various actions from options, in the presence of multiple states. The difference between the two is that state information is known, and multiple states are taken into account. In this simulation, we consider a model in which the current state and state transition information is unknown, maintaining one state for a certain period of time and then transitioning to another state. Regarding this model, we compare the general Bandit problem policy and reinforcement learning policy by cumulative reward. As a result, the cumulative reward was higher for the reinforcement learning policy than for the Bandit problem policy.

Presentation information

[3Rin2] Interactive Session 1

[3Rin2-05] A study on measures in multi-armed bandit problem with hidden state.