[3Yin2-32] Determining the Target Block for Pruning in Natural Language Processing Model Compression
Keywords:model compression, natural language model, pruning
Machine learning in natural language processing has been dominated by large pre-trained Transformer models, and it is known that the size of the model has a significant impact on performance. In this situation, BERT and other large models are out of reach for many people without large memory GPUs. Pruning is the method to solve this problem that removes unnecessary parameters from the network. Poor Man's BERT is an exiting pruning method that reduces the encoder block's size. It achieves higher performance than DistilBERT, but the pruning strategies are determined manually. We aim to improve the performance of Poor Man's BERT by determining the target blocks for pruning automatically. In this research, we introduce each layer's importance score based on the change in the loss. In experiments, it was confirmed that the performance degradation was reduced compared to the conventional method.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.