JSAI2021

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I1-GS-7b] 画像音声メディア処理:マルチモーダル処理

Fri. Jun 11, 2021 9:00 AM - 10:40 AM Room I (GS room 4)

座長:石原 賢太(NEC)

10:20 AM - 10:40 AM

[4I1-GS-7b-05] Learning Corresponding Relationship of Synonymous Sentences in Latent Space through a Common Image

〇Yanjun Sun1, Ichiro Kobayashi1 (1. Ochanomizu University)

Keywords:mutlimodal processing

In this study, we aim to investigate whether multimodal information can improve the understanding of uni-modal information by clarifying the relationship between the variables of each modality in the latent space. Here, we especially focus on two modalities: image and natural language, and have investigated whether a common image to synonymous sentences is useful for conversion between those two sentences through the latent space. As a result of the preliminary experiment, we confirmed that the accuracy and the efficiency of reconstructing the input sentence using the image whose content reflects that of the sentence is higher than the case without using such image.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password