JSAI2024

Presentation information

Poster Session

Poster session » Poster session

[3Xin2] Poster session 1

Thu. May 30, 2024 11:00 AM - 12:40 PM Room X (Event hall 1)

[3Xin2-60] Task-Driven Unsupervised Parallel Corpus Filtering

Koki Hatagaki1, 〇Sora Tarumoto1, Tomoyuki Kajiwara1, Takashi Ninomiya1 (1.Ehime University)

Keywords:Parallel Corpus Filtering, Text Simplification, Machine Translation, Grammatical Error Correction

To improve the performance of sequence-to-sequence models, we propose an unsupervised parallel corpus filtering method that can be applied to any natural language processing task.
Parallel corpus filtering has been mainly studied for the machine translation task where large-scale corpora are available.
However, due to advances in techniques such as transfer learning and in-context learning, corpus quality has become more important than corpus size in recent years.
We therefore propose parallel corpus filtering methods that can be applied to any sequence-to-sequence task using only a pre-trained sequence-to-sequence model and its evaluation metric or loss function for the target task.
We confirmed the effectiveness of the proposed method on three tasks: machine translation, text simplification, and grammatical error correction.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password