JSAI2024

Presentation information

General Session

General Session » GS-2 Machine learning

[1B4-GS-2] Machine learning: Expression learning

Tue. May 28, 2024 3:00 PM - 4:40 PM Room B (Concert hall)

座長:大澤 正彦(日本大学)

3:40 PM - 4:00 PM

[1B4-GS-2-03] Understanding Grokking Through the Lens of Lottery Ticket Hypothesis

〇Gouki Minegishi1, Yusuke Iwasawa1, Yutaka Matsuo1 (1. University of Tokyo)

Keywords:Grokking, Generalization, Lottery ticket hypothesis, Representation learning

Grokking is the intriguing phenomenon of delayed generalization: initially, a network achieves a memorization solution with perfect training accuracy and limited generalization solution; however, through further training, it eventually attains a generalization solution.
This paper counters previous notions that weight norm reduction explains grokking, by demonstrating through experiments that the identification of optimal subnetworks plays a crucial role in achieving generalization. It leverages the lottery ticket hypothesis to argue that finding these `lottery tickets' is key to transitioning from memorization to generalization. Our research presents empirical evidence, showing that (1) with the proper subnetworks, the delayed generalization does not occur, (2) with the similar weight norm, the dense networks still require substantially longer training to achieve full generalization, (3) with only structure optimization (without updating the value of weights), we can convert the memorization solution to the generalization solution. These results emphasize the importance of subnetwork identification over traditional weight norm reduction theories in explaining grokking's delayed generalization phenomenon.

Please log in with your participant account.
» Participant Log In