JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I1-GS-7] Language media processing:

Fri. May 31, 2024 9:00 AM - 10:40 AM Room I (Room 41)

座長:石川 開(日本電気株式会社)[[オンライン]]

10:00 AM - 10:20 AM

[4I1-GS-7-04] Image Editing via Initial Value Optimization with Diffusion Models

〇Shin Hashino1, Takashi Matsubara1 (1. Osaka University)

Keywords:Diffusion Models, Image Editing

Deep generative models such as diffusion models enable the generation of high-fidelity images from text. Such deep generative models are also available for image editing. For image editing, diffusion models perform inversion to transform the input image into initial noise, and then generate the edited image conditioned on text. However, this approach leads to unintended changes in parts of the image. Recent research enables editing only the desired content by controlling during the generative process. Nevertheless, it can also result in producing unnatural edited images. To address this challenge, we propose optimizing the initial noise of generative process to make it responsive to the target prompt for editing. After we optimize the initial noise, the edited image is generated by a pre-trained text-to-image diffusion model. This approach demonstrates to generate more natural edited images compared to existing methods.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password