Proposing system for generating audio influenced by audience evaluation using interactive GA

Maho Taniguchi

1:20 PM - 1:40 PM

[1G3-ES-5-01] Proposing system for generating audio influenced by audience evaluation using interactive GA

〇Maho Taniguchi¹, Kense Todo¹, Shoya Yasuda¹, Masayuki Yamamura¹ (1. Tokyo Institute of Technology School of Computing )

Keywords:interactive GA, audio, SpecGAN, Human interaction

When generating or selecting music/sound effects, it is necessary to search large audio databases to find an appropriate audio for the scene of animation or other video clips. However, the sound effects or background music generated by individual human experts may sometimes not make audience feel that it well matches with the scene. Therefore, an approach to generate audio considering listeners’ preferences is required. In this work, we suggest a way to generate a suitable audio for a scene using feedbacks from audience. In particular, we used SpecGAN, which is a kind of GAN that generated a wide variety of audio from latent space, and interactive GA, which is an optimization algorithm using human preferences in evaluation. In the process, the following steps were repeatedly done; SpecGAN generated audio from latent variables, human group ranks the audio, and the best group of latent variables were crossed over for create the next latent variables. As a result, we succeeded in controlling the direction of generating audio for individual scenes. We hope that the audio generated by the our method has significance as created by human experts.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1G3-ES-5] Human interface, education aid: Generate contents

[1G3-ES-5-01] Proposing system for generating audio influenced by audience evaluation using interactive GA

Password