Embedding and retrieval of images and text data using probability distribution

Kenta Hama

5:10 PM - 5:30 PM

[3L2-05] Embedding and retrieval of images and text data using probability distribution

〇Kenta Hama¹, Takashi Matsubara¹, Kuniaki Uehara¹ (1. Graduate School of System Infomatics, Kobe University)

Keywords:multi-modal, retrieval, representation learning

Multimodal data including images, sounds, texts is accumulated on the Internet.
We can expect general-purpose data representation to perform tasks such as data discrimination, generation, and retrieval on various modalities datasets.
The key idea for acquiring the representation is embedding a point from a data space of each modality in a point of common space.
However, if data is embedded in a point, it becomes difficult to interpret the ambiguity of the data's meaning and the inclusive relation among the data.
Of course, representation of data point does not necessarily need to be a point.
In this study, we embed image and text into a normal distribution in a common space.
This improves the performance of image retrieval.

Presentation information

[3L2] [General Session] 2. Machine Learning

[3L2-05] Embedding and retrieval of images and text data using probability distribution