10:00 AM - 10:20 AM
[2C1-GS-7-04] Diffusion Model Based on Text and Predicate Logic
Keywords:Diffusion Model, image generation
Diffusion models have achieved remarkable results in generating high-quality, diverse, and creative images. However, when it comes to text-based image generation, they often struggle to capture the intended meaning presented in the text. For instance, a desired object may not be generated, an extraneous object might appear, and an adjective may alter objects it was not intended to modify. Moreover, we observed that these models frequently miss relationships indicating possession between objects. In this paper, we introduce Predicated Diffusion, a unified framework to capture users' intentions more accurately. The proposed method does not solely rely on the text encoder, but instead, represents the intended meaning in the text as propositions using predicate logic and treats the pixels in the attention maps as the fuzzy predicates. This enables us to obtain a differentiable loss function that guides the image generation to align with the textual propositions by its minimization. Predicated Diffusion has demonstrated the ability to generate images that are more faithful to various text prompts, as verified by human evaluators and pretrained image-text models.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.