JSAI2023

Presentation information

Organized Session

Organized Session » OS-27

[2Q4-OS-27b] 強化学習の新展開

Wed. Jun 7, 2023 1:30 PM - 3:10 PM Room Q (601)

オーガナイザ:太田 宏之、甲野 佑、高橋 達二

2:10 PM - 2:30 PM

[2Q4-OS-27b-03] Emulation in reinforcement learning

〇Reina Kitade1, Takuma Wada2, Yu Kono1, Tatsuji Takahashi1 (1. Tokyo Denki University, 2. Graduate School of Tokyo Denki University)

Keywords:Reinforcement learning, Bandit problems, Social learning

Just by referring to the outcome level of others, humans often achieve better performances through independent trial and error. This is a form of social learning called emulation. In emulation, often only a few bits of information, such as a new world record in an athletic event, leads to the performance improvement of the whole community. In a previous study, it was shown that the performance of all the individuals in a group can be improved more effectively by interpreting the others' results in a ``pessimistic'' way. For a single agent reinforcement learning, ``optimism in the face of uncertainty'' is well established as an effective principle. We suggested ``individual optimism and group pessimism under uncertainty'' for multiple agents. Pessimistic outcome level estimation methods such as the lower confidence bound (LCB) was shown to be effective in social bandit problems. However, LCB-based pessimism could not cope with more realistic non-stationary environments. In this study, we propose a new group-pessimism and its algorithm in multi-agent learning that can cope with various non-stationary environments.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password