Using Search Results in Self-play Deep Reinforcement Learning

Kazuya Kagoshima

1:50 PM - 2:10 PM

[2D4-GS-2-02] Using Search Results in Self-play Deep Reinforcement Learning

〇Kazuya Kagoshima¹, Itsuki Noda¹, Satoshi Oyama¹ (1. Hokkaido University)

Keywords:Reinforcement Learning, Deep Learning, Self-play

We propose a new method for training data generation in self-play deep reinforcement learning, which are widely used in Game-AI like AlphaGoZero, AlphaZero, and so on. Generally, such self-play learning has not utilized most of search results that are generated in self-play. Currently, few researches try to make use of them. The proposed method converts the search result to training data by estimating final win/lose rewards and policy for it. The experimental investigation with various hyperparameters for the training suggests that the proposed method will help learning the policy effectively and stabilize the training.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2D4-GS-2] Machine learning

[2D4-GS-2-02] Using Search Results in Self-play Deep Reinforcement Learning

Password