Data Augmentation Using Spectral Structure for Supervised Monaural Source Separation of Frog Choruses

Tatsumi Ikushima

2:40 PM - 3:00 PM

[2O4-GS-7-05] Data Augmentation Using Spectral Structure for Supervised Monaural Source Separation of Frog Choruses

〇Tatsumi Ikushima¹, Ryu Takeda¹, Ikkyu Aihara², Kazunori Komatani¹ (1. Osaka University, 2. University of Tsukuba)

[[Online]]

Keywords:Monaural source separation, Deep learning, Data augmentation, Frog, Frequency spectrum

Sound source separation, which separates the individual sounds from the mixture, is necessary to analyze interaction between individuals in frog chorus. Supervised monaural source separation is promising for frogs, because they are crowded in groups and their positions to the microphone are fixed while a chorus but unknown before it. Although a large amount of sound data is required to train the separation model, it is difficult to collect data. It is necessary to capture many frogs and record their choruses. We propose to use data augmentation by focusing on the characteristics. We modulate and stretch calls to increase the pattern of the calls in the training data based on the analysis. We conduct a sound source separation experiment for two frogs using the augmented data. We confirmed the effectiveness of the data augmentation by the signal-to-distortion ratio.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2O4-GS-7] Vision, speech media processing

[2O4-GS-7-05] Data Augmentation Using Spectral Structure for Supervised Monaural Source Separation of Frog Choruses

Password