JSAI2022

Presentation information

General Session

General Session » GS-5 Language media processing

[1K1-GS-6] Language media processing: evaluation / anaysis

Tue. Jun 14, 2022 10:00 AM - 11:40 AM Room K (Room K)

座長:大杉 康仁(NTT)[遠隔]

10:40 AM - 11:00 AM

[1K1-GS-6-03] Detailed Evaluation on a dataset construction method for word reading disambiguation

〇Hideharu Nakajima1 (1. NTT Communication Science Labs., NTT Corporation)

Keywords:Data Augumentation, word reading disambiguation

Some Japanese words written in Chinese characters have multiple readings. Accurate reading classification is important, for example, in speech synthesis, and data collection is necessary both for humans to create and for computers to learn the classification rules. Thus we have proposed an efficient method for the data collection. However, its evaluation was limited to collection efficiency. For detailed evaluation, we further assessed the usefulness of the collected data by our method based on the accuracy of reading classification. Based on the results, even for a relatively new classification method using BERT, we were able to confirm the usefulness of the data collected by our method in improving the classification accuracy. Although the proposed method includes a human judgment of appropriateness for the collected sentences, we also confirmed that there is almost no degradation in accuracy even if the judgment is omitted.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password