Multi-objective Deep Reinforcement Learning for Crowd Guidance Policy Optimization

Ryo Nishida

2:50 PM - 3:10 PM

[3G4-OS-15b-01] Multi-objective Deep Reinforcement Learning for Crowd Guidance Policy Optimization

〇Ryo Nishida^1,2, Yuki Tanigaki², Masaki Onishi², Koichi Hashimoto¹ (1. Tohoku University, 2. AIST)

Keywords:Deep Reinforcement Learning , Multi-objective Optimization, Crowd movement control

The objective of this study is to improve Multi Objective Deep Reinforcement Learning (MODRL) for optimizing crowd guidance strategies. In general, MODRL is classified into Outer-loop method and Inner-loop method. In the former, multiple objective functions are transformed into a single objective using a scalarization function, and the Pareto front, which is the optimal solution set, is obtained by repeatedly updating the weights of the scalarization function and performing single-objective optimization. However, in this method, if the computational cost of single-objective optimization is high, the overall computational cost increases in proportion to the number of times the weights update. On the other hand, the latter the Inner-loop method is designed to learn Pareto front in a learning process. In this study, we examine the approximation of the Pareto solution by different action selection methods of Pareto-DQN, which is a typical method of the Inner-loop method. In the experiments, we evaluate the proposed method using a benchmark problem and finally discuss its application to the optimization of crowd guidance strategies.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3G4-OS-15b] 移動系列のデータマイニングと機械学習(2/2)

[3G4-OS-15b-01] Multi-objective Deep Reinforcement Learning for Crowd Guidance Policy Optimization

Password