JSAI2020

Presentation information

Interactive Session

[3Rin4] Interactive 1

Thu. Jun 11, 2020 1:40 PM - 3:20 PM Room R01 (jsai2020online-2-33)

[3Rin4-26] Systematic Analysis of Linguistic Phenomena for Better Understanding Translation on User-Generated Contents

〇Ryo Fujii1, Masato Mita2,1, Kaori Abe1,2, Kazuaki Hanawa2,1, Makoto Morishita3, Jun Suzuki1,2, Kentaro Inui1,2 (1.Tohoku University, 2.RIKEN, 3.NTT Communication Science Laboratories)

Keywords:Natural Language Processing, Machine Translation, Social Media

Neural Machine Translation (NMT) has shown drastic improvement on its quality when translating clean input. However, it still struggles with some kind of input with plentiful of noises, like User-Generated Contents (UGC) on the Internet. In order to make NMT systems indeed useful in promoting cross-cultural communication, one of the most promising direction we have to follow is to correctly handle with these input. Though necessary, it is still an open question that what brings the great gap of performance between translation of clean input and UGC. In this paper, we conducted systematic analysis on current dataset focusing on UGC and made it clear which linguistic phenomena greatly affected the translation performance.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password