Weight and Activation Ternarization in BERT

Soichiro Kaku

3:20 PM - 3:40 PM

[3J4-GS-6c-01] Weight and Activation Ternarization in BERT

〇Soichiro Kaku¹, Kyosuke Nishida¹, Sen Yoshida¹ (1. NTT Media Intelligence Laboratories, NTT Corporation)

Keywords:deep learning, language model, quantization

Quantization techniques that approximate float values with a small number of bits have been attracting attention to reduce the model size and speed of pre-trained language models such as BERT. On the other hand, quantization of activation (input to each layer) is mostly done with 8 bits, and it is empirically known that approximation with less than 8 bits is difficult to maintain accuracy.
In this study, we consider outliers in the intermediate representation of BERT to be a problem, and propose a ternarization method that can deal with outliers in the activation of each layer of the pre-trained BERT. Experimental results show that the ternarized model of weight and activation outperformed the previous method in language modeling and downstream tasks.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3J4-GS-6c] 言語メディア処理：言語モデル

[3J4-GS-6c-01] Weight and Activation Ternarization in BERT

Password