Additional Learning of Diffusion Models by Conditioning on Speech Bubbles in Manga Data

Koshiro Terasawa

9:40 AM - 10:00 AM

[2C1-GS-7-03] Additional Learning of Diffusion Models by Conditioning on Speech Bubbles in Manga Data

〇Koshiro Terasawa¹, Masahiro Suzuki¹, Yutaka Matsuo¹ (1. The University of Tokyo)

Keywords:Image Generation AI, Manga, Generative AI

While the popularity of manga is increasing, the drawing process is time-consuming. Although there have been studies aimed at reducing the burden, they have mainly focused on image transformation and have been limited in solving problems.
Diffusion models appeared and made it possible to generate the desired images with high quality. Also, additional training of pre-trained models enables low-cost specialisation to the manga domain, which is expected to assist in more advanced drawing processes.
This study targets the additional learning of diffusion models with manga data. Unlike usual, the generation of manga images must not generate speech bubbles or characters (noise) that can be inserted later. Because of the nature of the data, it is difficult to collect noise-free data,some ingenuity is required in the learning process.
We propose a method for learning with noise conditioned on additional learning, and generating conditionals without noise for inference. Experimental results show that the proposed method significantly reduces the noise in the generated images and improves the image quality compared to learning without conditioning.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2C1-GS-7] Language media processing:

[2C1-GS-7-03] Additional Learning of Diffusion Models by Conditioning on Speech Bubbles in Manga Data

Password