JSAI2018

Presentation information

Oral presentation

General Session » [General Session] 2. Machine Learning

[3L2] [General Session] 2. Machine Learning

Thu. Jun 7, 2018 3:50 PM - 5:30 PM Room L (3F Sapphire Hall Asuka)

座長:木村 昭悟(NTT)

5:10 PM - 5:30 PM

[3L2-05] Embedding and retrieval of images and text data using probability distribution

〇Kenta Hama1, Takashi Matsubara1, Kuniaki Uehara1 (1. Graduate School of System Infomatics, Kobe University)

Keywords:multi-modal, retrieval, representation learning

Multimodal data including images, sounds, texts is accumulated on the Internet.
We can expect general-purpose data representation to perform tasks such as data discrimination, generation, and retrieval on various modalities datasets.
The key idea for acquiring the representation is embedding a point from a data space of each modality in a point of common space.
However, if data is embedded in a point, it becomes difficult to interpret the ambiguity of the data's meaning and the inclusive relation among the data.
Of course, representation of data point does not necessarily need to be a point.
In this study, we embed image and text into a normal distribution in a common space.
This improves the performance of image retrieval.