3:40 PM - 4:00 PM
[2J4-GS-8c-02] Asynchronous Parallel Learning for Model-Free and Model-Based Reinforcement Learning
Keywords:deep reinforcement learning, asynchronous control, model-free and model-based reinforcement learning
Reinforcement learning algorithms are categorized into model-based methods, which explicitly estimate an environmental model and a reward function, and model-free methods, which directly learn a policy from real or generated experiences. We have proposed the parallel reinforcement learning algorithm for training multiple model-free and model-based reinforcement learners. The experimental results show a simple algorithm can contribute to complex algorithms' learning. However, since each learner's computation time was not considered, we could not fully demonstrate the advantage of using a simple model-free reinforcement learner.
This paper proposes an asynchronous parallel reinforcement learning method that considers the differences in control frequencies. The main contribution is separating the replay buffers collected by each learner and transforming the experience replay buffer to absorb the difference in control frequencies. The proposed method is applied to benchmark problems and compared with the case without considering the difference in control frequencies. The results show that the proposed algorithm selected the simple model-based method with a short control frequency in the early stage of learning, the complex model-based method in the middle stage of learning, and the model-free method in the late learning stage.
This paper proposes an asynchronous parallel reinforcement learning method that considers the differences in control frequencies. The main contribution is separating the replay buffers collected by each learner and transforming the experience replay buffer to absorb the difference in control frequencies. The proposed method is applied to benchmark problems and compared with the case without considering the difference in control frequencies. The results show that the proposed algorithm selected the simple model-based method with a short control frequency in the early stage of learning, the complex model-based method in the middle stage of learning, and the model-free method in the late learning stage.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.