Bi-level Reinforcement Learning with a Black-Box Max-Ent RL Follower

Mikoto Kudo; Youhei Akimoto

[2Win5-15] Bi-level Reinforcement Learning with a Black-Box Max-Ent RL Follower

〇Mikoto Kudo^1,2, Youhei Akimoto^1,2 (1.Tsukuba University, 2.RIKEN Center for Advanced Intelligence Project)

Keywords:Bi-level Reinforcement Learning, Black-Box Attack, Stackelberg Game

Bi-level reinforcement learning is a hierarchical problem in which the objective function of the leader's upper-level reinforcement learning depends on the outcome of the follower's lower-level reinforcement learning. This framework captures various tasks such as poisoning or guiding the follower's learning and reward alignment.
Many existing studies assume that the leader has access to information about the follower's reward function, policy, or learning algorithm. However, in practical scenarios, the follower is not always under the leader's complete control, and the leader may have only limited knowledge about the follower.
In this study, we consider such a black-box follower setting and propose a policy gradient method for obtaining the leader's optimal policy under the assumption that the follower follows an entropy-regularized optimal policy. Specifically, we analytically derive the policy gradient that accounts for the follower's response to updates in the leader's policy and propose a method to estimate this gradient using the follower's observed action sequences.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2Win5] Poster session 2

[2Win5-15] Bi-level Reinforcement Learning with a Black-Box Max-Ent RL Follower

Password