Cautious Actor-Critic: Stable off-policy deep reinforcement learning for continuous control

Toshinori Kitamura

2:20 PM - 2:40 PM

[4G3-GS-2l-03] Cautious Actor-Critic: Stable off-policy deep reinforcement learning for continuous control

〇Toshinori Kitamura¹, Lingwei Zhu¹, Takamitsu Matsubara¹ (1. NARA Institute of Science and Technology)

Keywords:Reinforcement Learning

While recent off-policy actor-critic (AC) methods have demonstrated superior sample-efficiency and performance in many challenging continuous control tasks, they often suffer from significant performance oscillation during learning due to the persistent errors induced by off-policy learning. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC) which achieves stable learning while maintaining the sample-efficiency and performance of off-policy AC methods. The name cautious comes from the doubly conservative nature of the algorithm, the conservative actor which linearly interpolates two consecutive policies and the conservative critic which prevents huge change between the consecutive policies. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate that CAC achieves comparable performance while significantly stabilizes learning.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4G3-GS-2l] 機械学習：学習方略(1/2)

[4G3-GS-2l-03] Cautious Actor-Critic: Stable off-policy deep reinforcement learning for continuous control

Password