[3Rin4-26] Systematic Analysis of Linguistic Phenomena for Better Understanding Translation on User-Generated Contents
Keywords:Natural Language Processing, Machine Translation, Social Media
Neural Machine Translation (NMT) has shown drastic improvement on its quality when translating clean input. However, it still struggles with some kind of input with plentiful of noises, like User-Generated Contents (UGC) on the Internet. In order to make NMT systems indeed useful in promoting cross-cultural communication, one of the most promising direction we have to follow is to correctly handle with these input. Though necessary, it is still an open question that what brings the great gap of performance between translation of clean input and UGC. In this paper, we conducted systematic analysis on current dataset focusing on UGC and made it clear which linguistic phenomena greatly affected the translation performance.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.