JSAI2023

Presentation information

Poster Session

General Session » Poster session

[4Xin1] Poster session 2

Fri. Jun 9, 2023 9:00 AM - 10:40 AM Room X (Exhibition hall B)

[4Xin1-13] JParaBank: A Large-Scale Sentence Pairs of Japanese Paraphrase via Machine Translation

〇Sora Tarumoto1, Hyuga Koretaka1, Tomoyuki Kajiwara1, Takashi Ninomiya1 (1.Ehime University)

Keywords:Paraphrase, Data Augmentation, Mechine Translation

To address the low-resource problem of machine learning tasks, including natural language processing, the effectiveness of data augmentation is well known.
In recent natural language processing, data augmentation based on paraphrase generation has been used successfully in many applications.
However, unlike English and Chinese, there is no large-scale corpus for training paraphrase generation models in Japanese, and thus data augmentation based on paraphrase generation is not available for Japanese.
We release JParaBank, a large-scale Japanese paraphrase corpus of 21 million sentence pairs.
Experimental results on the JGLUE benchmark show that data augmentation by paraphrase generation using JParaBank improves performance on many tasks.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password