JSAI2020

Presentation information

International Session

International Session » E-5 Human interface, education aid

[1G3-ES-5] Human interface, education aid: Generate contents

Tue. Jun 9, 2020 1:20 PM - 2:40 PM Room G (jsai2020online-7)

Chair: Naohiro Matsumura (Osaka University)

1:20 PM - 1:40 PM

[1G3-ES-5-01] Proposing system for generating audio influenced by audience evaluation using interactive GA

〇Maho Taniguchi1, Kense Todo1, Shoya Yasuda1, Masayuki Yamamura1 (1. Tokyo Institute of Technology School of Computing )

Keywords:interactive GA, audio, SpecGAN, Human interaction

When generating or selecting music/sound effects, it is necessary to search large audio databases to find an appropriate audio for the scene of animation or other video clips. However, the sound effects or background music generated by individual human experts may sometimes not make audience feel that it well matches with the scene. Therefore, an approach to generate audio considering listeners’ preferences is required. In this work, we suggest a way to generate a suitable audio for a scene using feedbacks from audience. In particular, we used SpecGAN, which is a kind of GAN that generated a wide variety of audio from latent space, and interactive GA, which is an optimization algorithm using human preferences in evaluation. In the process, the following steps were repeatedly done; SpecGAN generated audio from latent variables, human group ranks the audio, and the best group of latent variables were crossed over for create the next latent variables. As a result, we succeeded in controlling the direction of generating audio for individual scenes. We hope that the audio generated by the our method has significance as created by human experts.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password