[2Win5-54] M2D-X: Towards a Universal Audio Pre-training Framework
Keywords:audio representation, representation learning
General-purpose audio representations are useful building blocks for various audio applications; however, specialized representations for tasks that learn from application task data can be more useful. This study proposes M2D-X, a general framework for learning application-specific audio representations. Experimental results show that the proposed M2D-X learns effective representations with top-level performance for the highly competitive AudioSet and speech domain, a small-data medical task, and a zero-shot classification task.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.