[4Xin2-22] Efficient Data Annotation Methods for Speech Recognition Models through Active Learning
Keywords:Automated Speech Recognition, Acitve Learning, Human-in-the-Loop
End-to-End speech recognition models are known to perform well when using high-quality training data. However, creating such data typically incurs significant human and management costs. This study proposes a data selection method using active learning to efficiently annotate high-quality training data for speech recognition models. By employing a Character Error Rate (CER) prediction model built using features calculated from speech waveforms, we successfully identified data from the pool that should be annotated preferentially. Furthermore, the speech recognition model developed using our proposed method demonstrated superior performance compared to models trained with randomly annotated data, thereby proving the contribution of our method to the creation of efficient training data. Additionally, our research revealed that efficient labeling in terms of label quality positively influences the psychological aspects of annotators, leading to cost savings and improved accuracy of the speech recognition model.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.