JSAI2023

Presentation information

General Session

General Session » GS-2 Machine learning

[3E1-GS-2] Machine learning

Thu. Jun 8, 2023 9:00 AM - 10:20 AM Room E (A2)

座長:森田 尭(大阪大学/産業科学研究所) [オンライン]

9:20 AM - 9:40 AM

[3E1-GS-2-02] A study of Visual Abductive Reasoning

〇Haruki Nagasawa1, Yuta Matsumoto1, Taku Hasegawa2, Kyosuke Nishida2, Jun Suzuki1 (1. Tohoku University, 2. NTT Human Informatics Laboratories)

Keywords:Vision & Language, Visual abductive reasoning, image captioning

Humans can reason in an abductive and hypothetical manner which can be extended from a specific part of the image to infer nontrivial situations in the image itself based on experience and knowledge. When we see a person with a plate full of food, for example, we can assume that person is likely hungry, although we do not know the person well. Can computers perform this kind of visual reasoning?
This study uses the Sherlock dataset with two captions: (i) specific cues to regions of interest, such as objects and actions in images, and (ii) information that can be inferred from these cues. We analyze whether nontrivial hypothetical inferences can be generated end-to-end from images using the state-of-the-art image encoder and language model.
We report that the pre-trained vision-language model can generate some hypothetical visual inferences if we fine-tune the model to understand abductively.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password