JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[1D3-GS-7] Language media processing:

Tue. May 28, 2024 1:00 PM - 2:40 PM Room D (Temporary room 2)

座長:田崎豪(名城大学)

2:00 PM - 2:20 PM

[1D3-GS-7-04] Detection of mispronunciations by speech recognition and mispronunciation candidates

〇Arata Saito1, takuya matuzaki1 (1. Tokyo University of Science)

Keywords:AI, Speech recognition

We have developed a method for detecting reading errors in Japanese speech data. First, speech recognition is performed to transcribe a speech to the form of a phoneme sequence, and then it is checked whether it includes reading errors. In order to distinguish between errors in speech recognition and actual reading errors, we create a candidate list of reading errors for each morpheme, select the one with the smallest edit distance from the speech recognition result among the correct answer and the candidate reading errors, and detect it as a reading error if it is different from the correct reading. We conducted experiments on speech data in the LaboroTVspeech corpus and the Japanese Spoken Language Corpus, as well as synthetic speech. The results confirmed that the method is effective when the speech actually contains reading errors, although there were many cases in which reading errors were mis-detected even when the correct reading was made. In particular, in experiments with synthesized speech, the method was able to accurately detect misreading in 80.0% of the cases, including how a word was mispronunciated, and succeeded in detecting 98.6% of wrongly pronunciated morphemes.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password