Self and Opponent Modeling for Ensuring Markovian and Reward-Predictive Representations in Partially Observable Multi-Agent Environments

Kai Yamashita

3:00 PM - 3:20 PM

[1B3-OS-41a-05] Self and Opponent Modeling for Ensuring Markovian and Reward-Predictive Representations in Partially Observable Multi-Agent Environments

〇Kai Yamashita¹, Masahiro Suzuki¹, Yutaka Matsuo¹ (1. The University of Tokyo)

Keywords:Multi-Agent System, Representation Learning, Reinforcement Learning

Recent advances in reinforcement learning for multi-agent environments have underscored the importance of Opponent-Modeling, where agents infer internal states or strategies of opponents. Recent studies have explored AutoEncoder-based latent representations that limit access to opponent information during execution for Opponent-Modeling in partially observable environments.
In reinforcement learning, the state input to the policy and value function in a Markov decision process (MDP) must satisfy the Markov property and serve as a sufficient statistic for future reward prediction. However, under partial observability, many opponent modeling approaches focus solely on reconstructing opponent information in the latent representation, without ensuring that it retains Markovian or reward-predictive properties.
To overcome this limitation, we propose a representation learning method that models not only the opponent but also the agent itself. We validated our method through experiments, demonstrating its effectiveness in improving reinforcement learning performance.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1B3-OS-41a] OS-41

[1B3-OS-41a-05] Self and Opponent Modeling for Ensuring Markovian and Reward-Predictive Representations in Partially Observable Multi-Agent Environments

Password