JSAI2025

Presentation information

General Session

General Session » GS-5 Language media processing

[3G5-GS-6] Language media processing:

Thu. May 29, 2025 3:40 PM - 5:20 PM Room G (Room 1002)

座長:橋本 真幸 (東洋大学)

5:00 PM - 5:20 PM

[3G5-GS-6-05] Proposal of a Multimodal Emotion Recognition Model Based on the Fusion of Audio and Text

〇Yue Tan1, Jiazheng Zhou1, Kazuyuki Matsumoto2, Xin Kang2, Minoru Yoshida2 (1. Division of Science and Technology,Graduate School of Science and Technology for Innovation,Tokushima University, 2. Tokushima University,Graduate School of Technology, Industrial and Social Sciences)

Keywords:Multimodal Emotion Recognition, Transformer, Self-Attention

Multimodal emotion recognition is a technology that integrates multiple modalities—such as audio, text, and images—to more comprehensively and accurately identify and analyze human emotions. In the field of AI-driven dialogue systems, it has become an indispensable technology for facilitating smooth interactions. By fusing data from different modalities, such as audio and text, it is possible to account for inter-modal interactions and correlations that are not captured in single-modal emotion analysis, thereby improving both the generalizability and accuracy of emotion recognition.In this study, we constructed a multimodal emotion analysis model based on the Transformer architecture, which takes audio and text as inputs. By concatenating the outputs of the respective Transformer encoders for audio and text and then applying a Self-Attention mechanism to the concatenated representation, our model can fuse these modalities while preserving their Cross-modal relationships. In this paper, we conduct comparative evaluation experiments against multiple existing methods on CMU-MOSEI, a standard dataset for emotion recognition tasks, to validate the performance of the proposed model and confirm the advantages of multimodal fusion for emotion recognition.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password