JSAI2025

Presentation information

Poster Session

Poster session » Poster Session

[2Win5] Poster session 2

Wed. May 28, 2025 3:30 PM - 5:30 PM Room W (Event hall D-E)

[2Win5-55] NaiLIA: Multimodal Retrieval of Nail Designs Based on Relaxed Contrastive Loss

〇Kanon Amemiya1, Takumi Komatsu1, Daichi Yashima1, Ryosuke Korekata1, Kei Katsumata1, Komei Sugiura1 (1.Keio University)

Keywords:Multimodal Retrieval, Multimodal Foundation Model, Nail Design

We focus on the task of retrieving nail design images based on dense intent descriptions, which represent long and multi-layered user intent for nail designs.
This is challenging because such descriptions specify flexibly created paintings and pre-manufactured embellishments, as well as visual characteristics, spatial relationships, higher-order themes, and overall impressions.
Existing vision-and-language foundation models often struggle to capture the interplay between paintings and embellishments, failing to incorporate multi-layered intent descriptions.
To address this, we propose NaiLIA, a method that enables the retrieval of nail design images that comprehensively align with descriptions with dense user intent.
Our approach estimates confidence scores for images that align with a given description and can be considered as positive examples but are not explicitly labeled (unlabeled positives), and incorporates this score into the loss function.
To evaluate NaiLIA, we constructed a benchmark consisting of 10,625 images collected from people with diverse cultural backgrounds.
The images were annotated with long and dense intent descriptions given by over 200 annotators.
Experimental results demonstrate that the proposed method outperforms standard methods by 20.9 points in terms of recall@10.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password