JSAI2023

Presentation information

General Session

General Session » GS-5 Language media processing

[2E5-GS-6] Language media processing

Wed. Jun 7, 2023 3:30 PM - 5:10 PM Room E (A2)

座長:本浦 庄太(NEC) [現地]

4:30 PM - 4:50 PM

[2E5-GS-6-04] Accelerated text data augmentation using a paraphrase generation model with round-trip translation as a supervisor

〇Shintaro Tanaka1, Hitoshi Iima1 (1. Kyoto Institute of Technology)

Keywords:Machine Learning, Natural Language Processing, Data Augmentation

In machine learning, large amounts of data are needed to improve model performance.However, collecting them is costly, so a technique called data augmentation is used to generate new data from existing data.In natural language processing, there is a text data augmentation technique called round-trip translation,which translates text data into another language and then translates it back into the original language to generate a paraphrase of the original text.However, the round-trip translation is computationally expensive and time-consuming because it requires twice translations for one text. In this paper, we propose a faster text augmentation method using a model trained to make the round-trip translation.The dataset of this training consists of original texts and the results of their round-trip translation.Experimental results show that the proposed method, using the Text-To-Text Transfer Transformer (T5),can augment data at most about 1.6 times faster than round-trip translation.Furthermore, T5 can generate paraphrases not included in the training data based on the knowledge acquired through pretraining.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password