Presentation information

General Session

General Session » GS-5 Agents

[2O5-GS-5] Agents: game AI

Wed. Jun 15, 2022 3:20 PM - 5:00 PM Room O (Room 510)

座長:沖本 天太(神戸大学)[遠隔]

4:40 PM - 5:00 PM

[2O5-GS-5-05] Learning Method in Monte Carlo Softmax Search: Reinforcement Learning of State Evaluation Function by Sampling

〇Kanau Kumekawa1, Hiromasa Iwamoto1, Harukazu Igarashi1, Tooru Sugimoto1 (1. SHIBAURA INSTITUTE OF TECHNOLOGY)


Keywords:Monte Carlo Softmax Search, Reinforcement Learning, Boltzmann distribution

In two-player games such as Shogi, MC Softmax search, which is one of the selective search methods, and a learning method of state evaluation functions have been proposed by Igarashi et al. in 2018. The gradient vectors of action/state values with respect to learning parameters are efficiently computed by sampling along the search tree. This makes it possible to use all nodes as training data, and multiple reinforcement learning methods can be executed simultaneously. In this study, we showed that this method is not limited to the framework of two-player games, but can be extended to general agent learning problems. We also proposed a search-and-learn method in which both searching trees and learning evaluation functions are executed simultaneously. In addition, we applied the proposed method to a simple maze escape example to verify the algorithm.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.