Excluding the Data with Exploration from Supervised Learning Improves Neural Fictitious Self-Play

Keigo Kawamura

5:20 PM - 5:40 PM

[1N3-01] Excluding the Data with Exploration from Supervised Learning Improves Neural Fictitious Self-Play

〇Keigo Kawamura¹, Jun Suzuki^2,3, Yoshimasa Tsuruoka⁴ (1. Graduate School of Engineering, The University of Tokyo, 2. NTT Communication Science Laboratories, NTT Corporation, 3. RIKEN Center for Advanced Intelligence Project, 4. Graduate School of Information Science and Technology, The University of Tokyo)

Keywords:Imperfect information games, Reinforcement learning, Self-play, Nash equilibria

Neural fictitious self-play (NFSP) is a method for solving imperfect information games.
While methods developed in recent years such as counterfactual regret minimization or DeepStack require the state transition rules of the games, NFSP works without them.
In this paper, we propose to exclude the exploration data from the supervised learning component in NFSP and keep the probability of exploration, in order to explore without breaking the average strategy.
We show that this change significantly improves the performance of NFSP in a simplified poker game, Leduc Hold'em, and compare the results for different exploration plobabilities.

Presentation information

[1N3] [General Session] 2. Machine Learning

[1N3-01] Excluding the Data with Exploration from Supervised Learning Improves Neural Fictitious Self-Play