Determining the Target Block for Pruning in Natural Language Processing Model Compression

Akito Tokumasa; Yoshioka Michifumi; Katsufumi Inoue

[3Yin2-32] Determining the Target Block for Pruning in Natural Language Processing Model Compression

〇Akito Tokumasa¹, Yoshioka Michifumi¹, Katsufumi Inoue¹ (1.Graduate School of Engineering, Osaka Prefecture University)

Keywords:model compression, natural language model, pruning

Machine learning in natural language processing has been dominated by large pre-trained Transformer models, and it is known that the size of the model has a significant impact on performance. In this situation, BERT and other large models are out of reach for many people without large memory GPUs. Pruning is the method to solve this problem that removes unnecessary parameters from the network. Poor Man's BERT is an exiting pruning method that reduces the encoder block's size. It achieves higher performance than DistilBERT, but the pruning strategies are determined manually. We aim to improve the performance of Poor Man's BERT by determining the target blocks for pruning automatically. In this research, we introduce each layer's importance score based on the change in the loss. In experiments, it was confirmed that the performance degradation was reduced compared to the conventional method.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3Yin2] Interactive session 1

[3Yin2-32] Determining the Target Block for Pruning in Natural Language Processing Model Compression

Password