Model Reduction Effect of NAS during Finetuning of ViT

Xinyu Zhang

4:10 PM - 4:30 PM

[3J4-OS-3b-03] Model Reduction Effect of NAS during Finetuning of ViT

〇Xinyu Zhang¹, Sora Takashima¹, Rio Yokota¹ (1. Tokyo Institute of Technology)

Keywords:Vision Transformer, Finetuning, Neural Architecture Search

In image recognition, Vision Transformers (ViT) have achieved the State-of-the-Art in image classification on ImageNet. However, the models are becoming so large that they cannot even fit on a single GPU, which limits their usefulness during the inference. In order to reduce the size of such large vision transformer models, we utilize the AutoFormer proposed by Chen et al. In the original work on AutoFormer, the supernet is trained from scratch. In this work, we proposed a method that trains the supernet of AutoFormer from a pre-trained vision transformer, which is followed by an architecture search during fine-tuning. We find that for the same number of parameters, the classification accuracy is superior to the models trained from scratch.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3J4-OS-3b] AutoML（自動機械学習）(2/2)

[3J4-OS-3b-03] Model Reduction Effect of NAS during Finetuning of ViT

Password