2021年度 人工知能学会全国大会(第35回)

講演情報

国際セッション

国際セッション(Work in progress) » EW-2 Machine learning

[3N3-IS-2e] Machine learning (5/5)

2021年6月10日(木) 15:20 〜 17:00 N会場 (IS会場)

Chair: Hisashi Kashima (Kyoto University)

15:40 〜 16:00

[3N3-IS-2e-02] Improving Exploration and Convergence Speed with Multi-Actor Control DDPG

〇David John Lucien Felices1, Mitsuhiko Kimoto1, Shoya Matsumori1, Michita Imai1 (1. Keio University)

キーワード:Reinforcement Learning, DDPG, Multi-Actor, Deep Exploration, OpenAI Gym

In Reinforcement Learning, the Deep Deterministic Policy Gradient (DDPG) algorithm is considered to be a powerful tool for continuous control tasks. However, when it comes to complex environments, DDPG does not always show positive results due to its inefficient exploration mechanism. To deal with such issues, several studies decided to increase the number of actors, but without considering if there was an actual optimal number of actors that an agent could have.
We propose MAC-DDPG, which consists of a DDPG architecture with a variable number of actor networks. We also compare the computational cost and learning curves of using different numbers of actor networks on various OpenAI Gym environments.
The main goal of this research is to keep the computational cost as low as possible while improving deep exploration so that increasing the number of actors is not detrimental in solving less complex environments fast.
Currently, results show a potential increase in scores obtained on some environments (around +10%) compared with those obtained with classic DDPG, but greatly increase the time necessary to run the same number of epochs (time linearly increases with the number of actors).

講演PDFパスワード認証
論文PDFの閲覧にはログインが必要です。参加登録者の方は「参加者用ログイン」画面からログインしてください。あるいは論文PDF閲覧用のパスワードを以下にご入力ください。

パスワード