JSAI2020

Presentation information

General Session

General Session » J-9 Natural language processing, information retrieval

[3Q5-GS-9] Natural language processing, information retrieval: Semantic similarity

Thu. Jun 11, 2020 3:40 PM - 5:00 PM Room Q (jsai2020online-17)

座長:秋元康佑(NEC)

3:40 PM - 4:00 PM

[3Q5-GS-9-01] Semantic Consistency Assessment of Visual and Text Content using Multimodal Deep Neural Networks

Riko Suzuki1, 〇Mikito Konishi2, Junya Ikeda3, Daichi Hayashi4, So Fukai5, Yu Sugawara6, Yusuke Machii7, Yusuke Yamaura7 (1. Ochanomizu University, 2. Osaka University, 3. University of Fukui, 4. Doshisha University, 5. Tokyo Institute of Technology, 6. Hokkaido University, 7. Fuji Xerox Co., Ltd.)

Keywords:Multimodal, Deep Learning, Natural Language Processing, Image Recognition, Cross Attention

Semantic consistency assessment of an image and text inside a document is important task because readers refer the image to deepen understanding of text content. In this study, we develop a multimodal deep neural networks for the semantic consistency assessment of the image and the text. We propose a novel approach combines binary classification and angular margin loss to acquire discriminative features. We also clarify contradictions between the image and the text by visualizing cross-attention among objects inside the image and words in text. To show the effectiveness of our approach, we evaluate the accuracy of several models using flickr30k dataset which contains images and their captions. The results show that our proposed model outperforms the existing joint embedding model with 0.9 improvements in F-measure.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password