3:00 PM - 3:20 PM
[1O4-OS-29a-01] Disentangled Representation Learning for Multi-Viewpoint Music Retrieval
Keywords:Music Information Retrieval, Deep Learning, Music Recommendation, Representation Learning
To achieve a flexible MIR system, it is desirable to calculate music similarity by focusing on multiple partial elements of musical pieces and allowing the users to select the element they want to focus on. Our previous study proposed the use of each instrumental sound signal to calculate music similarity with each instrument-dependent network, but using each sound signal as a query in search systems is impractical. In this paper, we propose a method to compute similarities focusing on each instrument with a single network that inputs mixed sounds. We design a single similarity embedding space with disentangled dimensions for each instrument, extracted by Conditional Similarity Networks, which is trained by the triplet loss using masks. Experimental results show that (1) each sub-embedding space can hold the characteristics of the corresponding instrument, and (2) the selection of musical pieces by the proposed method can obtain human consent in limited conditions.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.