3:00 PM - 3:20 PM
[4I3-GS-11-04] Analysis of Dynamic Sparse Training Through the Lens of Weight Interdependence
Keywords:Lottery Tickets, Pruning
Dynamic Sparse Training (DST) is a promising technique for improving neural network efficiency by dynamically adjusting inactive weights (masks) during training. However, its theoretical underpinnings, particularly the role of weight interdependence in mask-swappings, remain underexplored. This paper presents a novel framework that leverages Hessian vector products (HVP) to incorporate second-order information, enhancing DST's mask-swapping mechanism. Our analysis demonstrates that weight interdependence significantly improves training loss reduction, especially under high mask-swapping regimes. Experimental results on CIFAR10 using three-layer wide MLPs validate our approach, showing improved robustness and performance compared to baseline methods such as RigL.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.