JSAI2022

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[2O1-GS-7] Vision, speech media processing: generation

Wed. Jun 15, 2022 9:00 AM - 10:40 AM Room O (Room 510)

座長:栗田 修平(理化学研究所)[現地]

10:00 AM - 10:20 AM

[2O1-GS-7-04] Emotionally Conditioned TTS with Facial Expression

〇Jun Enoki1, Takayuki Suzuki1 (1. Opus Inc.)

Keywords:text to speech, Arts and entertainment applications

In these days, generated speech by the modern TTS system can be undistinguishable with real human's speech, and many researches have been studied even on emotionally conditioned TTS.
Here we explore another way to control emotions on speech synthesis by combining facial expression data to achieve intuitive conditioning.
In this paper, we share our experimental results and discuss the details.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password