2:30 PM - 2:50 PM
[3C1-OS-14a-03] Predicting Important Utterance based on Fusing Verbal and Nonverbal Information
Keywords:multimodal information, face to face conversation, important utterance
Automatic meeting summarization would reduce the cost of producing minutes during or after a meeting. With the goal of establishing a method for extractive meeting summarization, we propose a multimodal fusion model that identifies the important utterances that should be included in meeting extracts of group discussions. The proposed multimodal model fuses audio, visual, motion, and linguistic unimodal models that are trained by employing a convolutional neural network approach. The performance of the verbal and nonverbal fusion model presented an F-measure of 0.827. We also discuss the characteristics of verbal and nonverbal models and demonstrate that they complement each other.