2:40 PM - 3:00 PM
[4G3-GS-6-03] Evaluating the Effectiveness of Contextual Information for Speech Recognition Error Correction Using Large Language Models
Keywords:Speech Recognition Error Correction, Large Language Model
We propose a method for zero-shot error correction of Japanese speech recognition results that contain errors. Specifically, we feed the text needing correction and the surrounding contextual information to a large language model. By providing context to a model with strong natural language generation capabilities and a rich vocabulary, the goal is to achieve appropriate context-aware corrections even for recognition outputs that contain numerous errors and are challenging to fix. Our accuracy evaluation showed that BLEU and BERT scores were either comparable to or better than the condition without context for all models and experimental settings tested. In particular, for sentences where the Word Error Rate (WER) exceeded 20\%, every model saw at least a three-point reduction in WER compared to the original Whisper output, and some models achieved a reduction of over ten points. On the other hand, for sentences with a low WER, there were noticeable cases where the WER worsened compared to the pre-correction recognition results. We infer this is influenced by excessive corrections by the large-scale language model or unnecessary rewrites when the input text is already highly accurate.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.