JSAI2022

Presentation information

Interactive Session

General Session » Interactive Session

[4Yin2] Interactive session 2

Fri. Jun 17, 2022 12:00 PM - 1:40 PM Room Y (Event Hall)

[4Yin2-16] Gradient Based Teacher Model Selection in Knowledge Distillation

〇Taiga Kume1, Hirono Kawashima2, Hiroo Bekku1, Jin Nakazawa1 (1.Keio University Faculty of Environment and Information Studies, 2.Keio University Graduate School of Media and Governance)

Keywords:Knowledge Distillation

Knowledge distillation is a model compression method that reduces the size of a model whilst retaining its original performance. The effectiveness of knowledge distillation is affected by many factors such as model size and model architecture, making it difficult to quantitatively select the teacher model that maximizes the performance of the student model. In particular, the performance difference between the teacher model and the student model is said to have a significant impact on the effectiveness of the distillation, but since the performance of the student model changes over the course of the distillation process, the ideal teacher model at that moment also changes accordingly. In this paper, we propose a novel method to continuously change the effect of multiple teacher models over the course of the distillation process. We assign a weight parameter to each of several candidate teacher models, and perform knowledge distillation while optimizing the weights using gradient descent. Evaluation experiments show that the accuracy of the student models distilled by the proposed method outperforms that of the conventional method in the CIFAR10 image classification task.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password