4:10 PM - 4:30 PM
[3J4-OS-3b-03] Model Reduction Effect of NAS during Finetuning of ViT
Keywords:Vision Transformer, Finetuning, Neural Architecture Search
In image recognition, Vision Transformers (ViT) have achieved the State-of-the-Art in image classification on ImageNet. However, the models are becoming so large that they cannot even fit on a single GPU, which limits their usefulness during the inference. In order to reduce the size of such large vision transformer models, we utilize the AutoFormer proposed by Chen et al. In the original work on AutoFormer, the supernet is trained from scratch. In this work, we proposed a method that trains the supernet of AutoFormer from a pre-trained vision transformer, which is followed by an architecture search during fine-tuning. We find that for the same number of parameters, the classification accuracy is superior to the models trained from scratch.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.