12:40 PM - 1:00 PM
[4R2-OS-22a-03] A Simple but Effective Method to Incorporate Multimodal Information for Utterance Relationship Comprehension
[[Online]]
Keywords:multimodal, group interaction, argument mining
Multimodal information such as audio and video can be effective to comprehend relationships between utterances in meetings. To incorporate long sequences of audio and video with short sequences of text, the appoach based on periodic averaging or samping of audio and video sequences has been proposed. This approach, however, tends to include less meaningful features of audio and video in window of sampling. We introduce a method that resamples audio and video embeddings based on attentions between embeddings and few latent features. Especailly, those fixed-length few latent features can capture information of varying-length audio and video sequences effectively. Experiments on the multimodal meeting corpus, AMI, showed that our multimodal method was comparable with text-only method in comprehension supportive relationships between utterances.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.