[3Xin2-79] Gradient "with" and "without" backpropagation
Keywords:Automatic differentiation, Forward gradient, Backpropagation, Optimization, Gradient
In recent years, the forward gradient, which is a new automatic differentiation method that generates unbiased gradient estimation, has attracted attention as an alternative to traditional backpropagation.
The forward gradient stands out for its memory efficiency, as it eliminates the need for retaining activations during the forward pass.
However, it faces the challenge of reducing time efficiency primarily due to the necessity for multiple samplings to diminish variance of its gradient estimation.
To address this issue, we propose an automatic differentiation technique designed to mitigate variance without compromising time efficiency.
The proposed method, termed hybrid gradient, partitions a neural network into two distinct groups of layers: the input group and the output group.
It applies the forward gradient to the input layer group and backpropagation to the output layer group.
Our technique uniquely positions itself by leveraging the ability to adjust the overall estimated variance of the forward gradient.
Namely, the hybrid gradient can scale the overall variance of the forward gradient by the square of the ratio between the number of layers in the input group and the total number of layers.
The forward gradient stands out for its memory efficiency, as it eliminates the need for retaining activations during the forward pass.
However, it faces the challenge of reducing time efficiency primarily due to the necessity for multiple samplings to diminish variance of its gradient estimation.
To address this issue, we propose an automatic differentiation technique designed to mitigate variance without compromising time efficiency.
The proposed method, termed hybrid gradient, partitions a neural network into two distinct groups of layers: the input group and the output group.
It applies the forward gradient to the input layer group and backpropagation to the output layer group.
Our technique uniquely positions itself by leveraging the ability to adjust the overall estimated variance of the forward gradient.
Namely, the hybrid gradient can scale the overall variance of the forward gradient by the square of the ratio between the number of layers in the input group and the total number of layers.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.