1:40 PM - 2:00 PM
[2H3-J-2-02] Adaptive Learning Rate Adjustment with Short-Term Pre-Training in Data-Parallel Deep Learning
Keywords:deep learning, learning rate, data-parallel, hyperparameter
This paper describes short-term pre-training (STPT) algorism to adaptively select an optimum learning rate (LR). The proposed STPT algorism is beneficial for quick model prototyping in data-parallel deep learning. It adaptively finds an appropriate LR from multiple LR sets by STPT, which means the multiple LRs are evaluated within the beginning few iterations in an epoch. The STPT short cuts the tuning process of LRs that is requested in conventional training procedure as hyper-parameter tuning, even if the unknown models are considered. Therefore, the proposed STPT reduces computational time and increases throughput to find the best LR for network training. This algorism reduces the computational time by 87.5% than the conventional method when the eight-LR sets are evaluated using eight-parallel workers. We verified the accuracy improvement by 4.8 % compared with the conventional one with a reference LR of 0.1; there are no accuracy deterioration is observed. In this algorism, better training convergence is shown and expresses the advantage in terms of training time especially for the unknown models than other cases such as fixed LR.