Efficient and Low Bias Policy Gradient Estimation in Contact Rich Differentiable Simulation

Ku Onoda

6:40 PM - 7:00 PM

[1S5-GS-2-04] Efficient and Low Bias Policy Gradient Estimation in Contact Rich Differentiable Simulation

〇Ku Onoda¹, Paavo Parmas¹, Yutaka Matsuo¹ (1. The University of Tokyo)

Keywords:reinforcement learning, differentiable simulators, gradient estimation, policy optimization

In policy gradient reinforcement learning, access to a differentiable model enables 1st-order gradient estimation that accelerates learning compared to relying solely on derivative-free 0th-order estimators. However, discontinuous dynamics cause bias and undermine the effectiveness of 1st-order estimators. Prior work addressed this bias by constructing a confidence interval around the REINFORCE 0th-order gradient estimator and using these bounds to detect discontinuities. However, the REINFORCE estimator is notoriously noisy, and we find that this method requires task-specific hyperparameter tuning and has low sample efficiency. We propose a novel method, Discontinuity Detection Composite Gradient (DDCG), which dynamically switches its gradient estimator by a statistical test for discontinuities based on smoothness assumptions. We evaluate our method on differentiable simulation control tasks and find that our method performs well even with a fixed hyperparameter and has effective gradient estimation even in the small sample regime.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1S5-GS-2] Machine learning:

[1S5-GS-2-04] Efficient and Low Bias Policy Gradient Estimation in Contact Rich Differentiable Simulation

Password