Analyzing Effects of Architectures and Pretraining Objective on Unidirectional and Bidirectional Pretrained Language Models

Hiroyoshi Nagao; Takumi Goto; Yuta Koreeda

[4Xin2-37] Analyzing Effects of Architectures and Pretraining Objective on Unidirectional and Bidirectional Pretrained Language Models

〇Hiroyoshi Nagao¹, Takumi Goto^1,2, Yuta Koreeda¹ (1.Research & Development Group, Hitachi, Ltd., 2.NARA Institute of Science and Technology)

Keywords:Large Language Models, Natural Language Processing, Text Classification, Analysis, Transformer

Recently, language models have grown in scale, allowing a single model to address broad tasks that previously required individual development.
Unidirectional pretrained language models (UniPLMs) such as GPT have become extremely large, with tens of billions of parameters, while bidirectional pretrained language models (BiPLMs) such as BERT stayed few hundred millions at most.
However, a previous research showed that BiPLMs with a relatively small number of parameters are more effective for classical non-generative tasks.
The purpose of this study is to determine whether the model architecture or the pretraining objective brings about the difference.
We trained BiPLMs and UniPLMs under a controlled condition and confirmed that the differences in GLUE scores increase before and after pretraining.
Since the only difference between the two models before pretraining is model architectures, the results imply that the influence of pretraining objectives is more dominant than that of model architectures.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4Xin2] Poster session 2

[4Xin2-37] Analyzing Effects of Architectures and Pretraining Objective on Unidirectional and Bidirectional Pretrained Language Models

Password