How do Masked Language Models perform when the input sequence length changes?

Yasuhito Ohsugi; Itsumi Saito; Kyosuke Nishida; Hisako Asano; Junji Tomita

[4Rin1-23] How do Masked Language Models perform when the input sequence length changes?

〇Yasuhito Ohsugi¹, Itsumi Saito¹, Kyosuke Nishida¹, Hisako Asano¹, Junji Tomita¹ (1.NTT Corporation)

Keywords:Natural Language Processing, Machine Learning

BERT, one of the most famous Masked Language Models (MLMs), has succeeded in various natural language processing tasks.
However, BERT cannot accept long documents that have more than the specific length determined in pretraining.
In this paper, we study how BERT depends on input sequence length by comparing the MLM accuracy between different sequence lengths for each part-of-speech and each named entity class.
As a result, the long sequence was necessary to predict proper nouns, especially person's names.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4Rin1] Interactive 2

[4Rin1-23] How do Masked Language Models perform when the input sequence length changes?

Password