JSAI2025

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4N1-GS-7] Vision, speech media processing:

Fri. May 30, 2025 9:00 AM - 10:40 AM Room N (Room 1009)

座長:早川 大智(東芝)

10:20 AM - 10:40 AM

[4N1-GS-7-05] Correction of Speech Recognition Errors using Word Pronunciation Information to Improve Speech Recognition Accuracy in Medical

〇Tasuku Kitade1, May Phyo Khaing2, Masanori Tsujikawa1, Koji Okabe1, Ryo Ishii3, Hitoshi Yamamoto1, Masahiro Kubo1, Atsuhiro Nakagawa3, Yukio Katori3 (1. NEC Corporation, 2. Human Resocia Co.,Ltd., 3. Tohoku University Hospital)

Keywords:Correction of Speech Recognition Errors, Word Pronunciation, Medical

We are investigating a medical documentation assistant system that aims to improve the efficiency of record and report creation by physicians by automatically generating medical documents from the recognized results of speech. For this system, it is essential that medical terminology is recognized with high accuracy. Difficult to obtain medical data, we propose a method to correct speech recognition errors using word reading information without using it. Specifically, we detect speech recognition errors from the recognition results using a Large Language Model (LLM) and obtain the readings of words identified as recognition errors through morphological analysis. Furthermore, we extract words similar to those readings and finally select the appropriate word from them using the LLM to correct the recognition errors. Evaluation experiments using simulated medical speech recognition results confirmed that the proposed method achieved a 12.9% reduction in errors of medical terms.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password