JSAI2023

Presentation information

General Session

General Session » GS-5 Language media processing

[4A3-GS-6] Language media processing

Fri. Jun 9, 2023 2:00 PM - 3:40 PM Room A (Main hall)

座長:庵 愛(NTT) [現地]

3:00 PM - 3:20 PM

[4A3-GS-6-04] Image Captioners Tell More Than Images Given to Them

Honori Udo1, 〇Takafumi Koshinaka1 (1. Yokohama City University)

[[Online]]

Keywords:CNN, Transformer, BERT

Image captioning, a.k.a. image-to-text, which generates descriptive text from given images, has been rapidly developing through the era of deep learning. To what extent is the information of the original image preserved in the descriptive text generated by an image captioner? To answer that question, we perform an experiment to classify images only from the descriptive text without looking at the images at all, and compare it with a standard CNN-based image classifier. We evaluate several image captioning models on a disaster image classification task, CrisisNLP, and show that descriptive text classifiers can sometimes achieve higher accuracy than the CNN-based classifier. Furthermore, we show that fusing the CNN-based classifier and the descriptive text classifier can provide further accuracy improvement.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password