JSAI2024

Presentation information

Cancelled

General Session

General Session » GS-7 Vision, speech media processing

[2C1-GS-7] Language media processing:

Wed. May 29, 2024 9:00 AM - 10:40 AM Room C (Temporary room 1)

座長:西澤直樹((株)東芝)

9:40 AM - 10:00 AM

[2C1-GS-7-03] Additional Learning of Diffusion Models by Conditioning on Speech Bubbles in Manga Data

〇Koshiro Terasawa1, Masahiro Suzuki1, Yutaka Matsuo1 (1. The University of Tokyo)

Keywords:Image Generation AI, Manga, Generative AI

While the popularity of manga is increasing, the drawing process is time-consuming. Although there have been studies aimed at reducing the burden, they have mainly focused on image transformation and have been limited in solving problems.
Diffusion models appeared and made it possible to generate the desired images with high quality. Also, additional training of pre-trained models enables low-cost specialisation to the manga domain, which is expected to assist in more advanced drawing processes.
This study targets the additional learning of diffusion models with manga data. Unlike usual, the generation of manga images must not generate speech bubbles or characters (noise) that can be inserted later. Because of the nature of the data, it is difficult to collect noise-free data,some ingenuity is required in the learning process.
We propose a method for learning with noise conditioned on additional learning, and generating conditionals without noise for inference. Experimental results show that the proposed method significantly reduces the noise in the generated images and improves the image quality compared to learning without conditioning.

Please log in with your participant account.
» Participant Log In