Q-Learning in Prisoner's Dilemma with Noisy Observations

Mitsuki Sakamoto

9:40 AM - 10:00 AM

[2I1-GS-5a-03] Q-Learning in Prisoner's Dilemma with Noisy Observations

〇Mitsuki Sakamoto¹, Atsushi Iwasaki¹ (1. The University of Electro-Communications)

Keywords:Game Theory, Reinforcement Learning, Multi-agent, Prisoners Dilemma, Repeated Game

This paper examines how Q-learning acquires (non) cooperative behavior in a repeated prisoner's dilemma where players can misperceive the opponent's actions. How people cooperate is a fundamental and interdisciplinary question in artificial intelligence, economics, biology, and so on.Under such misperception, even the well-known tit-for-tat strategy (TFT) is hard to retain cooperation because retaliation occurs.On the other hand, it has been shown that a minor, but important strategy, Win-Stay, Lose-Shift (WSLS) can effectively recover cooperation even after misperception. The main question of this paper is whether a simple Q-learning can learn such a resilient cooperative behavior as WSLS. To this end, we first propose a Q-learning system called Neural Replicator Dynamics with Mutation (NeuRD+M) for games with misperception and then observe that NeuRD+M outperforms two existing Q-learning systems with respect to rewards and cooperation rates and learns the behavior of WSLS.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2I1-GS-5a] エージェント：ゲーム理論

[2I1-GS-5a-03] Q-Learning in Prisoner's Dilemma with Noisy Observations

Password