Style Analysis of E-Commerce Site Images Using Multimodal Embeddings

Miki Katsuragi

1:20 PM - 1:40 PM

[1D3-GS-7-02] Style Analysis of E-Commerce Site Images Using Multimodal Embeddings

〇Miki Katsuragi¹, Kenji Tanaka¹ (1. The university of Tokyo)

Keywords:Large Language Model, Embeddings, Clustering

With the expansion of the e-commerce market and advancements in technology, a detailed analysis of consumer purchasing behavior and understanding of preferences have become crucial. This is particularly true where the visual appeal of product images plays a significant role in consumer engagement. In our study, we utilized multimodal embeddings to analyze the style and nuances of art images on e-commerce sites. Specifically, we employed COCA (Contrastive Captioners as Image-Text Foundation Models) to extract multimodal embeddings that capture the complex patterns and stylistic elements of product images. We then clustered these images into distinct style groups. Our analysis revealed that multimodal embeddings are effective in detecting subtle stylistic changes in images. Furthermore, it suggested that the application of such generative AI could greatly enhance the understanding of image characteristics preferred by consumers.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1D3-GS-7] Language media processing:

[1D3-GS-7-02] Style Analysis of E-Commerce Site Images Using Multimodal Embeddings

Password