[4Rin1-23] How do Masked Language Models perform when the input sequence length changes?
Keywords:Natural Language Processing, Machine Learning
BERT, one of the most famous Masked Language Models (MLMs), has succeeded in various natural language processing tasks.
However, BERT cannot accept long documents that have more than the specific length determined in pretraining.
In this paper, we study how BERT depends on input sequence length by comparing the MLM accuracy between different sequence lengths for each part-of-speech and each named entity class.
As a result, the long sequence was necessary to predict proper nouns, especially person's names.
However, BERT cannot accept long documents that have more than the specific length determined in pretraining.
In this paper, we study how BERT depends on input sequence length by comparing the MLM accuracy between different sequence lengths for each part-of-speech and each named entity class.
As a result, the long sequence was necessary to predict proper nouns, especially person's names.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.