Presentation information

Organized Session

Organized Session » OS-3

[3J4-OS-3b] AutoML(自動機械学習)(2/2)

Thu. Jun 16, 2022 3:30 PM - 5:10 PM Room J (Room J)

オーガナイザ:大西 正輝(産業技術総合研究所)[現地]、日野 英逸(統計数理研究所/理化学研究所)

4:10 PM - 4:30 PM

[3J4-OS-3b-03] Model Reduction Effect of NAS during Finetuning of ViT

〇Xinyu Zhang1, Sora Takashima1, Rio Yokota1 (1. Tokyo Institute of Technology)

Keywords:Vision Transformer, Finetuning, Neural Architecture Search

In image recognition, Vision Transformers (ViT) have achieved the State-of-the-Art in image classification on ImageNet. However, the models are becoming so large that they cannot even fit on a single GPU, which limits their usefulness during the inference. In order to reduce the size of such large vision transformer models, we utilize the AutoFormer proposed by Chen et al. In the original work on AutoFormer, the supernet is trained from scratch. In this work, we proposed a method that trains the supernet of AutoFormer from a pre-trained vision transformer, which is followed by an architecture search during fine-tuning. We find that for the same number of parameters, the classification accuracy is superior to the models trained from scratch.

