Presentation information

Organized Session

Organized Session » OS-10

[1N4-OS-10a] System1型+2型統合AIへの展望(1/2)

Tue. Jun 14, 2022 2:20 PM - 4:00 PM Room N (Room 501)

オーガナイザ:栗原 聡(慶應義塾大学)[現地]、山川 宏(全脳アーキテクチャ・イニシアティブ)、三宅 陽一郎(スクウェア・エニックス)

3:00 PM - 3:20 PM

[1N4-OS-10a-03] Multi-layer Attentive Text-to-image Generation with CLIP-GAN

〇Yuya Kobayashi1, Masahiro Suzuki1, Yutaka Matsuo1 (1. The University of Tokyo Graduate School of Engineering)

Keywords:Deep Generative Models, Text to Image, Symbol Grounding, artificial creativity

In this research, we are interested in text-to-image from the point of view of pondering. Pondering is a mental activity which update inference results using observations and past experiences. Recently, text-to-image models using CLIP have been gaining much attention for its great generation quality. These models update their latent variables in order to adapt its output to input prompt, and this process can be regarded as a kind of pondering. However, these models update their entire output (latent variable) and they cannot attend to specific areas. Though we said this is a kind of pondering, we also think this process is not structured enough, and it is still more like intuition. In this research, we propose structured models to enable attention to specific areas and update them independently. We hope this improves the generation quality and gives us a clue to thinking about pondering.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.