Policy Iteration for Stationary Stackelberg Equilibria in General-sum Stochastic Games

Mikoto Kudo

2:20 PM - 2:40 PM

[4D3-GS-2-02] Policy Iteration for Stationary Stackelberg Equilibria in General-sum Stochastic Games

Proposal of Pareto-optimal Policies in terms of Staclelberg Equilibria and Probable Convergence Guarantee of the Iterative Method by Policy Improvements

〇Mikoto Kudo^1,2, Yohei Akimoto^1,2 (1. Tsukuba University, 2. RIKEN Center for Advanced Intelligence Project)

Keywords:Stochastic game, Stackelberg Equilibrium, Multi-agent MDP, Multi-agent RL, Policy guidance

A stochastic game is a game model where agents simultaneous maximize their cumulative rewards. A Stackelberg equilibrium is defined as a pair of policies that maximize the leader agent's return when the follower agent's policy is always the best response against the leader's one. Stationary Stackelberg equilibria (SSE) are not always exist, and existing methods require strong assumptions to guarantee the convergence and the coincidence of the limit with the SSE. We propose an alternative solution concept, Pareto-optimal (PO) policies, and an algorithm for PO policies based on the policy iteration. Our method monotonically approaches the Pareto front by iterative local policy improvements.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4D3-GS-2] Machine learning: Basics / Theory

[4D3-GS-2-02] Policy Iteration for Stationary Stackelberg Equilibria in General-sum Stochastic Games

Proposal of Pareto-optimal Policies in terms of Staclelberg Equilibria and Probable Convergence Guarantee of the Iterative Method by Policy Improvements

Password