Efficient Data Annotation Methods for Speech Recognition Models through Active Learning

Yosuke Yamano; Hideaki Tamori; Kaori Sugino; Yuka Kuroda

[4Xin2-22] Efficient Data Annotation Methods for Speech Recognition Models through Active Learning

〇Yosuke Yamano¹, Hideaki Tamori¹, Kaori Sugino¹, Yuka Kuroda² (1.The Asahi Shimbun Company, 2.Mitsubishi UFJ Research and Consulting Co.,Ltd)

Keywords:Automated Speech Recognition, Acitve Learning, Human-in-the-Loop

End-to-End speech recognition models are known to perform well when using high-quality training data. However, creating such data typically incurs significant human and management costs. This study proposes a data selection method using active learning to efficiently annotate high-quality training data for speech recognition models. By employing a Character Error Rate (CER) prediction model built using features calculated from speech waveforms, we successfully identified data from the pool that should be annotated preferentially. Furthermore, the speech recognition model developed using our proposed method demonstrated superior performance compared to models trained with randomly annotated data, thereby proving the contribution of our method to the creation of efficient training data. Additionally, our research revealed that efficient labeling in terms of label quality positively influences the psychological aspects of annotators, leading to cost savings and improved accuracy of the speech recognition model.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4Xin2] Poster session 2

[4Xin2-22] Efficient Data Annotation Methods for Speech Recognition Models through Active Learning

Password