JSAI2023

Presentation information

General Session

General Session » GS-5 Language media processing

[2E5-GS-6] Language media processing

Wed. Jun 7, 2023 3:30 PM - 5:10 PM Room E (A2)

座長:本浦 庄太(NEC) [現地]

3:30 PM - 3:50 PM

[2E5-GS-6-01] Improving Caption Generation Performance using Diffusion Process

〇Satoko Hirano1, Ichiro Kobayashi1 (1. Ochanomizu University)

Keywords:Diffusion Process, Caption Generation

In recent years, generative models using diffusion process have achieved the state-of-the-art performance in the continuous domain and have been actively studied in discrete data generation. In this study, we propose caption generation using a language model and a classifier based on diffusion process.
To improve the performance of caption generation, we examine the difference in accuracy with and without a pre-trained language model in the classifier, and investigate under what conditions appropriate captions can be generated for each image. Although the accuracy of our method using diffusion process was not good, we have confirmed that natural language generation could be controlled by the performance of a classifier in the sampling process.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password