Generation of Facial Animation from Voice using End-to-End Learning

Hirofumi Omichi; Kazuya Mera; Yoshiaki Kurosawa; Toshiyuki Takezawa

[3Rin4-04] Generation of Facial Animation from Voice using End-to-End Learning

〇Hirofumi Omichi¹, Kazuya Mera¹, Yoshiaki Kurosawa¹, Toshiyuki Takezawa¹ (1.Graduate School of Information Sciences, Hiroshima City University)

Keywords:Deep Learning, Generation of Facial Animation, Entertainment, Voice

Expressive facial animation has an important role in communication. Some avatars can express them using Face Tracking, that is one of the typical facial expression synchronization methods, but facial expressions cannot be created from previously recorded speech or synthetic speech without facial expressions. In this study, we propose a method to generate facial animation using only voice. Specifically, a learning model is designed using the acoustic features of the uttered speech as input and the parameters of the Action Unit (AU) analyzed from the facial expression video as teacher data. The experimental results indicated that the loss value of our proposed method was lower than that of the existing method. In addition, the activities of AUs by proposed method fluctuated smoother than the existing method. It will be perceived as natural facial expression.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3Rin4] Interactive 1

[3Rin4-04] Generation of Facial Animation from Voice using End-to-End Learning

Password