JSAI2023

Presentation information

Organized Session

Organized Session » OS-27

[2Q4-OS-27b] 強化学習の新展開

Wed. Jun 7, 2023 1:30 PM - 3:10 PM Room Q (601)

オーガナイザ:太田 宏之、甲野 佑、高橋 達二

1:50 PM - 2:10 PM

[2Q4-OS-27b-02] Target-oriented Exploration in Deep Reinforcement Learning

〇Yu Kono1,2, Jun Kume1, Ikeda Ryuji3, Tatsuji Takahashi1 (1. Tokyo Denki University, School of Science and Engineering, 2. DeNA Co., Ltd., 3. Graduate School of Tokyo Denki University)

Keywords:Reinforcement Learning, Deep Learning, Cognitive Science

The flexibility of humans in learning is thought to be derived from the conceptualization of complex worlds and their ability to make analogies and combinations. On the other hand, humans excel at information gathering necessary for conceptualization. For example, by estimating what should be achieved and setting it as a current goal, humans can discern the current state of affairs, and semi-instructive evaluation can promote learning. In this study, we considered that such human "target-oriented exploration" is effective in reinforcement learning. Risk-sensitive satisficing (RS) is a meta-policy that realizes this exploratory tendency. While deep reinforcement learning, which can handle complex state sequences, has become mainstream in recent years, RS could not be applied due to two major problems: one is deterministic selection, which makes the underlying probability distribution for importance sampling for sample efficiency improvement latent, and the other is the approximation of the complex state representation of the curiosity-like trial ratio of confidence. In this study, we solved these problems by deriving theoretical selection probabilities and neighborhood approximations and applied the purpose-oriented exploration RS algorithm to deep reinforcement learning.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password