JSAI2024

Presentation information

General Session

General Session » GS-2 Machine learning

[4D3-GS-2] Machine learning: Basics / Theory

Fri. May 31, 2024 2:00 PM - 3:40 PM Room D (Temporary room 2)

座長:伊東 邦大(日本電気株式会社)

2:00 PM - 2:20 PM

[4D3-GS-2-01] ADOPT: an Adaptive Gradient Method with the Optimal Convergence Rate with Any Hyperparameters

〇Shohei Taniguchi1, Keno Harada1, Gouki Minegishi1, Yuta Oshima1, Seong Cheol Jeong1, Go Nagahara1, Tomoshi Iiyama1, Masahiro Suzuki1, Yusuke Iwasawa1, Yutaka Matsuo1 (1. The University of Tokyo)

Keywords:stochastic optimization, Adam

Adaptive gradient methods, such as Adam, are widely used for deep learning. However, it is known that they do not converge unless choosing hyperparameters in a problem-dependent manner. There have been many attempts to fix their convergence (e.g., AMSGrad), but they require an impractical assumption that the gradient noise is uniformly bounded. In this paper, we propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of O(1/√T) with any hyperparameter choice without the bounded noise assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum calculation and the scaling operation by the second moment estimate. We also conduct intensive numerical experiments, and verify that our ADOPT achieves competitive or even better results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, language modeling, and deep reinforcement learning.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password