A Note on Improving Accuracy in Composed Image Retrieval through Training Data Generation

Kenta Uesugi; Naoki Saito; Keisuke Maeda; Takahiro Ogawa; Miki Haseyama

[2Win5-30] A Note on Improving Accuracy in Composed Image Retrieval through Training Data Generation

Introduction of a Counterfactual Image Generation Model with Text Refinement

〇Kenta Uesugi¹, Naoki Saito¹, Keisuke Maeda¹, Takahiro Ogawa¹, Miki Haseyama¹ (1.Hokkaido University)

Keywords:Composed Image Retrieval, Counterfactual Image Generation, Data Augmentation

In this paper, we propose a training data generation method using a counterfactual image generation model for Composed Image Retrieval (CIR). CIR is a retrieval method that utilizes both images and text as queries, enabling the handling of nuanced information that is difficult to express with a single modality. It is an essential technique for efficient image data retrieval. However, training CIR models requires a large amount of triplet data, which consists of a reference image, modification text, and a target image. Constructing such datasets requires significant time and effort. To address this issue, we propose a method that introduces text refinement into a counterfactual image generation model to efficiently augment diverse triplet data. We conduct experiments with two types of datasets: real-world scene images and fashion item images. The results show that the augmented dataset generated by the proposed method is of sufficient quality to enhance the performance of CIR models.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2Win5] Poster session 2

[2Win5-30] A Note on Improving Accuracy in Composed Image Retrieval through Training Data Generation

Introduction of a Counterfactual Image Generation Model with Text Refinement

Password