Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I2-GS-7c] 画像音声メディア処理:音声認識と指示理解

Fri. Jun 11, 2021 11:00 AM - 12:40 PM Room I (GS room 4)

座長:宮西 大樹(国際電気通信基礎技術研究所)

11:20 AM - 11:40 AM

[4I2-GS-7c-02] Speaker-independent acoustic features extraction using StarGAN-VC and its applications for double articulation analysis

〇Soichiro Komura1, Kaede Hayashi1, Akira Taniguchi1, Tadahiro Taniguchi 1, Hirokazu Kameoka2 (1. Ritsumeikan University, 2. NTT Communication Science Laboratories)

Keywords:NPB-DAA, StarGAN-VC, Neuro-SERKET, Unsupervised learning

Nonparametric Bayesian double articulation analyzer (NPB-DAA) is a method to discover words and phoneme units from continuous speech signals in an unsupervised manner. However, acoustic features have speaker-dependency, and it prevent NPB-DAA from discovering words and phonem units from multi-speaker utterances. This paper proposes to use star generative adversarial network for voice conversion (StarGAN-VC) to extract speaker-independent acoustic features and optimize NPB-DAA and StarGAN-VC simultaneously by using mutual learning based on Neuro-SERKET framework. The effect of mutual learning is shown through an experiment.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.