JSAI2022

Presentation information

Interactive Session

General Session » Interactive Session

[3Yin2] Interactive session 1

Thu. Jun 16, 2022 11:30 AM - 1:10 PM Room Y (Event Hall)

[3Yin2-37] speak like a dog!

dog speech synthesis using non-parallel voice conversion with deep learning

〇Kohei Suzuki1, Shoki Sakamoto1, Tadahiro Taniguchi1, Hirokazu Kameoka2 (1.Ritsumeikan University, 2.NTT Communication Science Laboratories)

Keywords:Voice Conversion

In this study, we propose a method to convert human speech into dog-like speech while retaining the linguistic information.
One type of board game is a Table Talk Role-Playing Game~(TRPG), which has a wide variety of imaginary creatures such as goblins and zombies.
Voice Conversion~(VC) may be used to represent the voices of such imaginary creatures.
To achieve this goal, we conducted comparison experiment between two audio features~(mel-cepstral coefficients and mel-spectrogram), two non-parallel VC methods~(Variational autoencoder based and generative adversarial network based) and five kernel sizes.
Although we have been able to convert human voice into dog voice in a fragmented manner, it is difficult to maintain the linguistic information and further improvements are needed.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password