In this study, we aim to investigate whether multimodal information can improve the understanding of uni-modal information by clarifying the relationship between the variables of each modality in the latent space. Here, we especially focus on two modalities: image and natural language, and have investigated whether a common image to synonymous sentences is useful for conversion between those two sentences through the latent space. As a result of the preliminary experiment, we confirmed that the accuracy and the efficiency of reconstructing the input sentence using the image whose content reflects that of the sentence is higher than the case without using such image.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.