Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I2-GS-7c] 画像音声メディア処理:音声認識と指示理解

Fri. Jun 11, 2021 11:00 AM - 12:40 PM Room I (GS room 4)

座長:宮西 大樹(国際電気通信基礎技術研究所)

11:00 AM - 11:20 AM

[4I2-GS-7c-01] StarGAN-VC+ASR: unsupervised voice conversion exploiting speech recognition results for regularization

〇Shoki Sakamoto1, Akira Taniguchi1, Tadahiro Taniguchi1, Hirokazu Kameoka2 (1. Ritsumeikan University, 2. NTT Communication Science Laboratories)

Keywords:unsupervised voice conversion, StarGAN-VC, Automatic Speech Recognision, linguistic information, Regularization

Star generative adversarial network for voice conversion (StarGAN-VC) is a method allowing non-parallel many-to-many voice conversion. Though in voice conversion task, retention of linguistic information is very important, sounds converted by StarGAN-VC sometimes collapsed linguistic information. This is because StarGAN-VC does not use any linguistic information during learning the voice conversion, and it just focuses non-symbolic acoustic features.This paper proposes a method that exploited speech recognition results presumed by automatic speech recognition (ASR) in training of StarGAN-VC's Generator. The experiment shows that our proposed method can make StarGAN-VC retain more linguistic information than the vanilla StarGAN-VC.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.