JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[2C1-GS-7] Language media processing:

Wed. May 29, 2024 9:00 AM - 10:40 AM Room C (Temporary room 1)

座長:西澤直樹((株)東芝)

9:20 AM - 9:40 AM

[2C1-GS-7-02] Editable Virtual Try-On Using Text Prompts

〇Kosuke Takemoto1, Takafumi Koshinaka1 (1. Yokohama City University)

Keywords:Diffusion Model, Virtual Try-On, Fashion design, Generative Model, Stable Diffusion

As the increasing amount of clothes are sold on EC sites rather than physical stores, the high return rate has become a problem in the apparel industry. Research studies on virtual try-on, which helps consumers know how a cloth fits themselves without physical try-on, has attracted public attention. In recent years, there has also been a new research trend in generating images from simple text or graphic instructions, which assists clothing designers. Our virtual try-on model is built on the basis of the parallel U-Net architecture introduced in TryOnDiffusion as well as Stable Diffusion, an open-source text-to-image model, and able to explore high-quality designs from existing clothes by text-based instructions. A series of experiments demonstrates the competitiveness of the proposed model in the quality of generated images and the ability of the model to assist in clothing design through natural language instructions.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password