JSAI2022

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[2O1-GS-7] Vision, speech media processing: generation

Wed. Jun 15, 2022 9:00 AM - 10:40 AM Room O (Room 510)

座長:栗田 修平(理化学研究所)[現地]

10:20 AM - 10:40 AM

[2O1-GS-7-05] Improving Object Coverage of Text-to-Image Generation by Object Matching

〇Shogo Ishii1, Tomoaki Yamazaki1, Seiya Ito1, Kouzou Ohara1 (1. Aoyama Gakuin University)

Keywords:Text-to-Image, GANs, Object Detection

Text-to-image generation aims to generate images according to a given text describing scene information such as objects and scenery. The existing methods implicitly learn the correspondence relation between words in text and regions in an image from text-image pairs by an attention mechanism. However, the objects specified in the text often do not appear in the generated image. In this paper, we propose a text-to-image generation model that explicitly learns the correspondence relation between objects in the text and in the generated image to improve object coverage. The proposed method applies object detection to the generated image and promotes missing objects to appear in the image by introducing a loss function considering the completeness of the correspondence between objects in a text and in an image. We demonstrate our model outperforms existing methods in object coverage.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password