4:50 PM - 5:10 PM
[2M5-OS-24-05] Model compression of BERT with One-Shot NAS
Keywords:NAS, BERT, Local feature, Model compression
In recent years, research has been conducted on language models with larger model sizes to improve model performance, but pre-training such models requires a large amount of time. To solve this problem, model compression has been studied as a method to reduce model size while maintaining model performance. Also, research has been conducted to improve the performance of language models by incorporating an architecture that can efficiently learn local features. Therefore, in this study, to search for model structures that can reduce the model size while maintaining performance, we conducted a neural architecture search (NAS), for architectures that can efficiently learn local features.
We evaluated the resulting models using the GLUE benchmark. We were able to reduce the number of model parameters by 46.1%, while increasing the average score by 0.5 compared to the BERT-base model.
We evaluated the resulting models using the GLUE benchmark. We were able to reduce the number of model parameters by 46.1%, while increasing the average score by 0.5 compared to the BERT-base model.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.