3:00 PM - 3:20 PM
[4N3-GS-6-04] Robust Pre-Training on Low-Quality Texts via Bregman Divergence
Keywords:natural language processing, additional pre-training, robust statistics, language model, deep learning
In the midst of rapid development of Large Language Models (LLMs), there is an ongoing trend towards enlarging training corpora to train high-performance models. However, not all texts included in such large-scale training corpora are of high quality, and the presence of low-quality texts in these extensively collected corpora could potentially hinder the improvement of model performance. This study proposes a robust learning method, with the objective of mitigating the impact of noise in pre-training of language models with corpora containing low-quality texts found within real-world data sources. Specifically, we focus on a broad class known as Bregman-Divergence, employing β-Divergence and γ-Divergence, which are included in this class and effective in robust statistics. In our experiments, we conducted fine-tuning and additional pre-training of BERT, demonstrating that our proposed method functions robustly in training with noisy training texts and labels, in comparison to the conventional training approach using KL-Divergence.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.