5:20 PM - 5:40 PM
[1N3-01] Excluding the Data with Exploration from Supervised Learning Improves Neural Fictitious Self-Play
Keywords:Imperfect information games, Reinforcement learning, Self-play, Nash equilibria
While methods developed in recent years such as counterfactual regret minimization or DeepStack require the state transition rules of the games, NFSP works without them.
In this paper, we propose to exclude the exploration data from the supervised learning component in NFSP and keep the probability of exploration, in order to explore without breaking the average strategy.
We show that this change significantly improves the performance of NFSP in a simplified poker game, Leduc Hold'em, and compare the results for different exploration plobabilities.