JSAI2022

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[2O4-GS-7] Vision, speech media processing

Wed. Jun 15, 2022 1:20 PM - 3:00 PM Room O (Room 510)

座長:岡部 浩司(NEC)[遠隔]

2:40 PM - 3:00 PM

[2O4-GS-7-05] Data Augmentation Using Spectral Structure for Supervised Monaural Source Separation of Frog Choruses

〇Tatsumi Ikushima1, Ryu Takeda1, Ikkyu Aihara2, Kazunori Komatani1 (1. Osaka University, 2. University of Tsukuba)

[[Online]]

Keywords:Monaural source separation, Deep learning, Data augmentation, Frog, Frequency spectrum

Sound source separation, which separates the individual sounds from the mixture, is necessary to analyze interaction between individuals in frog chorus. Supervised monaural source separation is promising for frogs, because they are crowded in groups and their positions to the microphone are fixed while a chorus but unknown before it. Although a large amount of sound data is required to train the separation model, it is difficult to collect data. It is necessary to capture many frogs and record their choruses. We propose to use data augmentation by focusing on the characteristics. We modulate and stretch calls to increase the pattern of the calls in the training data based on the analysis. We conduct a sound source separation experiment for two frogs using the augmented data. We confirmed the effectiveness of the data augmentation by the signal-to-distortion ratio.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password