JSAI2025

Presentation information

Poster Session

Poster session » Poster Session

[2Win5] Poster session 2

Wed. May 28, 2025 3:30 PM - 5:30 PM Room W (Event hall D-E)

[2Win5-54] M2D-X: Towards a Universal Audio Pre-training Framework

〇Daisuke Niizumi1, Daiki Takeuchi1, Yasunori Ohishi1, Noboru Harada1, Kunio Kashino1 (1.NTT Corporation)

Keywords:audio representation, representation learning

General-purpose audio representations are useful building blocks for various audio applications; however, specialized representations for tasks that learn from application task data can be more useful. This study proposes M2D-X, a general framework for learning application-specific audio representations. Experimental results show that the proposed M2D-X learns effective representations with top-level performance for the highly competitive AudioSet and speech domain, a small-data medical task, and a zero-shot classification task.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password