JSAI2024

Presentation information

Poster Session

Poster session » Poster session

[4Xin2] Poster session 2

Fri. May 31, 2024 12:00 PM - 1:40 PM Room X (Event hall 1)

[4Xin2-22] Efficient Data Annotation Methods for Speech Recognition Models through Active Learning

〇Yosuke Yamano1, Hideaki Tamori1, Kaori Sugino1, Yuka Kuroda2 (1.The Asahi Shimbun Company, 2.Mitsubishi UFJ Research and Consulting Co.,Ltd)

Keywords:Automated Speech Recognition, Acitve Learning, Human-in-the-Loop

End-to-End speech recognition models are known to perform well when using high-quality training data. However, creating such data typically incurs significant human and management costs. This study proposes a data selection method using active learning to efficiently annotate high-quality training data for speech recognition models. By employing a Character Error Rate (CER) prediction model built using features calculated from speech waveforms, we successfully identified data from the pool that should be annotated preferentially. Furthermore, the speech recognition model developed using our proposed method demonstrated superior performance compared to models trained with randomly annotated data, thereby proving the contribution of our method to the creation of efficient training data. Additionally, our research revealed that efficient labeling in terms of label quality positively influences the psychological aspects of annotators, leading to cost savings and improved accuracy of the speech recognition model.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password