JSAI2024

Presentation information

General Session

General Session » GS-5 Language media processing

[4N3-GS-6] Language media processing:

Fri. May 31, 2024 2:00 PM - 3:40 PM Room N (Room 54)

座長:田中涼太(NTT人間情報研究所)

2:40 PM - 3:00 PM

[4N3-GS-6-03] Multi-Source Text Classification for Multilingual Language Models with Machine Translation

〇Reon Kajikawa1, Keiichiro Yamada2, Tomoyuki Kajiwara1, Takashi Ninomiya1 (1. Ehime University, 2. Tokyo Metropolitan College of Industrial Technology)

Keywords:Natural Language Processing, Multilingual Language Model, Multi-Source, Text Classification

To reduce the cost of training models for each language for developers of natural language processing applications, pre-trained multilingual sentence encoders are promising.
However, since training corpora for such multilingual sentence encoders contain only a small amount of text in languages other than English, they suffer from performance degradation for non-English languages.
To improve the performance of pre-trained multilingual sentence encoders for non-English languages, we propose a method of machine translating a source sentence into English and then inputting it together with the source sentence in a multi-source manner.
Experimental results on sentiment analysis and topic classification tasks in Japanese revealed the effectiveness of the proposed method.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password