StarGAN-VC+ASR: unsupervised voice conversion exploiting speech recognition results for regularization

Shoki Sakamoto

11:00 AM - 11:20 AM

[4I2-GS-7c-01] StarGAN-VC+ASR: unsupervised voice conversion exploiting speech recognition results for regularization

〇Shoki Sakamoto¹, Akira Taniguchi¹, Tadahiro Taniguchi¹, Hirokazu Kameoka² (1. Ritsumeikan University, 2. NTT Communication Science Laboratories)

Keywords:unsupervised voice conversion, StarGAN-VC, Automatic Speech Recognision, linguistic information, Regularization

Star generative adversarial network for voice conversion (StarGAN-VC) is a method allowing non-parallel many-to-many voice conversion. Though in voice conversion task, retention of linguistic information is very important, sounds converted by StarGAN-VC sometimes collapsed linguistic information. This is because StarGAN-VC does not use any linguistic information during learning the voice conversion, and it just focuses non-symbolic acoustic features.This paper proposes a method that exploited speech recognition results presumed by automatic speech recognition (ASR) in training of StarGAN-VC's Generator. The experiment shows that our proposed method can make StarGAN-VC retain more linguistic information than the vanilla StarGAN-VC.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4I2-GS-7c] 画像音声メディア処理：音声認識と指示理解

[4I2-GS-7c-01] StarGAN-VC+ASR: unsupervised voice conversion exploiting speech recognition results for regularization

Password