JSAI2023

Presentation information

Poster Session

General Session » Poster session

[4Xin1] Poster session 2

Fri. Jun 9, 2023 9:00 AM - 10:40 AM Room X (Exhibition hall B)

[4Xin1-26] Frequency Analysis in Voice Conversion Using Generative Adversarial Networks

〇Fuya WADA1, Yoshiaki KUROSAWA1, Kazuya MERA1, Toshiyuki TAKEZAWA1 (1.Hiroshima City Univercity)

Keywords:Voice Conversion, GAN, Generative Adversarial Networks

In recent years, deep learning has enabled high-quality speech synthesis and voice quality conversion. Traditional methods use a GAN (Generative Adversarial Network) to perform voice conversion. However, the generated speech sounds a little muffled compared to actual speech. There are also some shortcomings regarding the generated 2D features. Therefore, in this study, the generated spectrogram is divided into several frequency bands, and the Mel-Cepstrum Distortion (MCD) of each frequency band to investigate and analyze which frequency bands are well generated. Analysis showed that the low frequency of the generated Spectrograms were well generated, but the mid/high frequency were not well generated. In addition, we found that although the linguistic information was reproduced, the reproduction of speaker characteristics was insufficient.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password