2:30 PM - 2:50 PM
[2K4-ES-2-03] Many-to-many Voice Conversion based on a CycleGAN using a Radial Loss
Keywords:voice conversion, deep learning, cycle gan, gans, parallel-data free
Voice conversion (VC) is a technique that allows a person to speak with the voice of another person. It's one of applications of voice processing that depends on both signal processing and machine learning to achieve it. In this paper we propose a many-to-many voice conversion method based on a CycleGan which we call the Radial CycleGan. In this method, generators consist of a general encoder(ENC-0) and general decoder(DEC-0 for a standard voice sample (TTS voice) and a pair of an encoder and decoder for any new voice sample. We define radial loss between encoders and decoders in addition to commonly used cycle and identity losses to train generators and discriminators. The process of training for each new user aims to train a new pair of (encoder, decoder) on the standard pair of TTS which makes it possible to convert voices directly on the trained pair of encoder and decoder of the training. This method will contribute in creating real-time systems that are able to convert among pretrained speaker’s voices in a robust way and easy to add new users through collecting small datasets from them.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.