Delayed Reward Increases Complexity of LLM Agent Strategies

Yuto Miki; Hibiki Harano; Hirotaka Osawa

[1Win4-31] Delayed Reward Increases Complexity of LLM Agent Strategies

Measuring the Complexity of Strategy sentences of LLM Agents Under IPD and AMPD Conditions in Game Theory

〇Yuto Miki¹, Hibiki Harano¹, Hirotaka Osawa¹ (1.Keio University)

Keywords:AI, LLM, IPD, AMPD, CoT

According to the social brain theory, the primary reason for the significant growth of the human brain during evolution lies in the ability to infer the intentions of others within a social context. This study aims to replicate the social brain theory using LLM agents and game theory. Under the conditions of the Iterated Prisoner's Dilemma game (IPD) and the anti-max prisoner's dilemma game (AMPD), LLM agents were assigned the task of devising game-theoretical strategies. These agents then made decisions to cooperate or defect based on their strategies and further revised their strategies based on the outcomes of their decisions. The study analyzed the extent to which the complexity of the LLM agents' strategies differed across the two conditions. As a result, unlike previous studies that measured complexity using genetic algorithms in automata, no significant difference was observed.For prompt engineering, methods such as Chain-of-Thought (CoT) and Zero-shot CoT were employed.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1Win4] Poster session 1

[1Win4-31] Delayed Reward Increases Complexity of LLM Agent Strategies

Measuring the Complexity of Strategy sentences of LLM Agents Under IPD and AMPD Conditions in Game Theory

Password