Analyzing the Effects of Data Augmentation on Performance Improvement in Natural Language Processing

Itsuki Okimura

11:00 AM - 11:20 AM

[1K1-GS-6-04] Analyzing the Effects of Data Augmentation on Performance Improvement in Natural Language Processing

〇Itsuki Okimura¹, Makoto Kawano¹, Machel Reid¹, Yutaka Matsuo¹ (1. The University of Tokyo)

Keywords:NLP, data augmentation

In machine learning, if the number of data is insufficient for the number of parameters in the model, the model may overfit the data, resulting in overlearning that reduces the generalization performance. One of the regularization strategies to avoid overlearning is data augmentation, which artificially increases the number of data used for training. While data augmentation is widely used in the field of image recognition, its use is limited in the field of natural language processing. The reason for this is that in the field of natural language processing, the effects of data augmentation methods on performance has not been uniformly evaluated, and the effective data augmentation methods for each task are unclear.
In this study, we examined the effects of conventional data augmentation methods on the performance of natural language processing on multiple data sets. The results show that data augmentation is effective when training with a small amount of data on some datasets when using a pre-training model. We also defined a metrics of the strength of data expansion and evaluated the correlation between this metrics and the performance after training.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1K1-GS-6] Language media processing: evaluation / anaysis

[1K1-GS-6-04] Analyzing the Effects of Data Augmentation on Performance Improvement in Natural Language Processing

Password