3:20 PM - 3:40 PM
[4B3-GS-1-05] Hessian spectral analysis for adaptive optimizers of neural networks
Keywords:deep learning, optimization, hessian matrix, loss surface analysis
On the other hand, it has been pointed out that the parameters obtained by these adaptive methods do not generalize as much as ones obtained by SGD.
The mechanism behind this difference is still not fully understood.
We analyzed convergence points reached by adaptive and non-adaptive methods using the Hessian spectrum of the loss function with respect to parameters.
Experiments showed that SGD tends to converge to flatter locations than adaptive optimizers do.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.