JSAI2023

Presentation information

Organized Session

Organized Session » OS-22

[4R2-OS-22a] グループインタラクションとAI

Fri. Jun 9, 2023 12:00 PM - 1:40 PM Room R (602)

オーガナイザ:酒造 正樹、湯浅 将英、岡田 将吾、近藤 一晃、中野 有紀子

12:40 PM - 1:00 PM

[4R2-OS-22a-03] A Simple but Effective Method to Incorporate Multimodal Information for Utterance Relationship Comprehension

〇Yasuhito Ohsugi1, Yuka Ozeki1, Shuhei Tateishi1, Yoshihisa Kanou1, Makoto Nakatsuji1 (1. NTT Resonant Incorporated)

[[Online]]

Keywords:multimodal, group interaction, argument mining

Multimodal information such as audio and video can be effective to comprehend relationships between utterances in meetings. To incorporate long sequences of audio and video with short sequences of text, the appoach based on periodic averaging or samping of audio and video sequences has been proposed. This approach, however, tends to include less meaningful features of audio and video in window of sampling. We introduce a method that resamples audio and video embeddings based on attentions between embeddings and few latent features. Especailly, those fixed-length few latent features can capture information of varying-length audio and video sequences effectively. Experiments on the multimodal meeting corpus, AMI, showed that our multimodal method was comparable with text-only method in comprehension supportive relationships between utterances.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password