JSAI2023

Presentation information

General Session

General Session » GS-2 Machine learning

[1B5-GS-2] Machine learning

Tue. Jun 6, 2023 5:00 PM - 6:40 PM Room B (Civic hall B)

座長:松野 竜太(NEC) [現地]

6:20 PM - 6:40 PM

[1B5-GS-2-05] Hypervolume Maximization Q-learning

〇Takuma Shibahara1,2, Takeshita Kouki1 (1. Hitachi, Ltd., 2. Keio University)

Keywords:Reinforcement learning, Multi objective optimization, Multi objective reinforcement learning

Multi-Objective Reinforcement Learning (MORL) is a generalization of standard reinforcement learning that aims to balance multiple, possibly conflicting, objectives. A common challenge in MORL is to learn policies that correspond to any Pareto optimal solution, especially when the Pareto front is non-convex. In this paper, we propose a novel method that learns a single policy that directly optimizes the hypervolume metric, which measures the volume dominated by a set of points in the objective space. The main idea is to transform the multiple objective values into hypervolumes and apply Watkins' Q-learning algorithm to learn a policy that maximizes the hypervolume. Moreover, our method can adapt the policy to achieve any desired Pareto solution without retraining. We call our method hypervolume maximization Q-learning, and present two variants of it–a tabular version and a deep learning version. We evaluated our method on the Deep Sea Treasure benchmark, a non-convex MORL problem, and show that it can effectively learn policies that achieve all Pareto solutions.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password