Keywords:unsupervised voice conversion, StarGAN-VC, Automatic Speech Recognision, linguistic information, Regularization
Star generative adversarial network for voice conversion (StarGAN-VC) is a method allowing non-parallel many-to-many voice conversion. Though in voice conversion task, retention of linguistic information is very important, sounds converted by StarGAN-VC sometimes collapsed linguistic information. This is because StarGAN-VC does not use any linguistic information during learning the voice conversion, and it just focuses non-symbolic acoustic features.This paper proposes a method that exploited speech recognition results presumed by automatic speech recognition (ASR) in training of StarGAN-VC's Generator. The experiment shows that our proposed method can make StarGAN-VC retain more linguistic information than the vanilla StarGAN-VC.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.