2:40 PM - 3:00 PM
[4N3-GS-6-03] Multi-Source Text Classification for Multilingual Language Models with Machine Translation
Keywords:Natural Language Processing, Multilingual Language Model, Multi-Source, Text Classification
To reduce the cost of training models for each language for developers of natural language processing applications, pre-trained multilingual sentence encoders are promising.
However, since training corpora for such multilingual sentence encoders contain only a small amount of text in languages other than English, they suffer from performance degradation for non-English languages.
To improve the performance of pre-trained multilingual sentence encoders for non-English languages, we propose a method of machine translating a source sentence into English and then inputting it together with the source sentence in a multi-source manner.
Experimental results on sentiment analysis and topic classification tasks in Japanese revealed the effectiveness of the proposed method.
However, since training corpora for such multilingual sentence encoders contain only a small amount of text in languages other than English, they suffer from performance degradation for non-English languages.
To improve the performance of pre-trained multilingual sentence encoders for non-English languages, we propose a method of machine translating a source sentence into English and then inputting it together with the source sentence in a multi-source manner.
Experimental results on sentiment analysis and topic classification tasks in Japanese revealed the effectiveness of the proposed method.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.