JSAI2024

Presentation information

Organized Session

Organized Session » OS-29

[1O4-OS-29a] OS-29

Tue. May 28, 2024 3:00 PM - 4:40 PM Room O (Music studio hall)

オーガナイザ:北原 鉄朗(日本大学)、中村 栄太(京都大学)、浜中 雅俊(理化学研究所)

3:00 PM - 3:20 PM

[1O4-OS-29a-01] Disentangled Representation Learning for Multi-Viewpoint Music Retrieval

〇Yuka Hashizume1, Atsushi Miyashita1, Li Li1, Tomoki Toda1 (1. Nagoya University)

Keywords:Music Information Retrieval, Deep Learning, Music Recommendation, Representation Learning

To achieve a flexible MIR system, it is desirable to calculate music similarity by focusing on multiple partial elements of musical pieces and allowing the users to select the element they want to focus on. Our previous study proposed the use of each instrumental sound signal to calculate music similarity with each instrument-dependent network, but using each sound signal as a query in search systems is impractical. In this paper, we propose a method to compute similarities focusing on each instrument with a single network that inputs mixed sounds. We design a single similarity embedding space with disentangled dimensions for each instrument, extracted by Conditional Similarity Networks, which is trained by the triplet loss using masks. Experimental results show that (1) each sub-embedding space can hold the characteristics of the corresponding instrument, and (2) the selection of musical pieces by the proposed method can obtain human consent in limited conditions.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password