Improving Caption Generation Performance using Diffusion Process

Satoko Hirano

3:30 PM - 3:50 PM

[2E5-GS-6-01] Improving Caption Generation Performance using Diffusion Process

〇Satoko Hirano¹, Ichiro Kobayashi¹ (1. Ochanomizu University)

Keywords:Diffusion Process, Caption Generation

In recent years, generative models using diffusion process have achieved the state-of-the-art performance in the continuous domain and have been actively studied in discrete data generation. In this study, we propose caption generation using a language model and a classifier based on diffusion process.
To improve the performance of caption generation, we examine the difference in accuracy with and without a pre-trained language model in the classifier, and investigate under what conditions appropriate captions can be generated for each image. Although the accuracy of our method using diffusion process was not good, we have confirmed that natural language generation could be controlled by the performance of a classifier in the sampling process.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2E5-GS-6] Language media processing

[2E5-GS-6-01] Improving Caption Generation Performance using Diffusion Process

Password