JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[1D3-GS-7] Language media processing:

Tue. May 28, 2024 1:00 PM - 2:40 PM Room D (Temporary room 2)

座長:田崎豪(名城大学)

1:20 PM - 1:40 PM

[1D3-GS-7-02] Style Analysis of E-Commerce Site Images Using Multimodal Embeddings

〇Miki Katsuragi1, Kenji Tanaka1 (1. The university of Tokyo)

Keywords:Large Language Model, Embeddings, Clustering

With the expansion of the e-commerce market and advancements in technology, a detailed analysis of consumer purchasing behavior and understanding of preferences have become crucial. This is particularly true where the visual appeal of product images plays a significant role in consumer engagement. In our study, we utilized multimodal embeddings to analyze the style and nuances of art images on e-commerce sites. Specifically, we employed COCA (Contrastive Captioners as Image-Text Foundation Models) to extract multimodal embeddings that capture the complex patterns and stylistic elements of product images. We then clustered these images into distinct style groups. Our analysis revealed that multimodal embeddings are effective in detecting subtle stylistic changes in images. Furthermore, it suggested that the application of such generative AI could greatly enhance the understanding of image characteristics preferred by consumers.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password