JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[2C1-GS-7] Language media processing:

Wed. May 29, 2024 9:00 AM - 10:40 AM Room C (Temporary room 1)

座長:西澤直樹((株)東芝)

9:00 AM - 9:20 AM

[2C1-GS-7-01] Toyama Dialect Recognition and Conversion to Standard Japanese via Deep Learning

〇Yuka Horimoto1, Itsugun Cho1, Hiroaki Saito1 (1. Keio University)

[[Online]]

Keywords:Toyama dialect, Dialectal recognition, Deep learning

Motivated by a deep affection for Toyama, the study focuses on speech recognition of the Toyama dialect. Despite an appreciation for its unique language style, it poses communication challenges with individuals from different regions. Therefore, this study aims to develop a system that converts the Toyama dialect into standard Japanese by speech recognition, facilitating communication for visitors from other areas. We employed wav2vec 2.0 for the speech recognition model and used two GPT-2 models for standard Japanese conversion model. We created a Toyama dialect corpus and enhanced its quality via meticulous transcription. All speech data underwent smoothing via RMS and data augmentation through masking during the training. In the experiments, we employed CER and WER as automatic evaluations, and the human evaluations focused on semantic equivalence and grammaticality. Empirical studies demonstrate that our model outperformed the baseline, and the effectiveness of our approach is verified in the discussion.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password