JSAI2024

Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I3-GS-7] Language media processing:

Fri. May 31, 2024 2:00 PM - 3:20 PM Room I (Room 41)

座長:宇野 裕(日本電気株式会社)

2:40 PM - 3:00 PM

[4I3-GS-7-03] Detection and Correction of Object Hallucination using Attention Map and Gradient Information in LVLMs

〇Kazuki Yamaji1, Tomohiro Takagi1 (1. Meiji University)

Keywords:Object Hallucination, multimodal, Large Vision-Language Models

Inspired by the superior language processing capabilities of Large Language Models (LLMs), there has been a recent push to develop Large Vision Language Models (LVLMs) that incorporate powerful LLMs to enhance performance on complex multimodal tasks. However, these LVLMs face issues with Object Hallucination, where they inaccurately recognize and describe objects that do not exist in the image or misrepresent the relationships between objects.
To address this problem, we propose a framework that detects and corrects Object Hallucination. This framework identifies and detects the specific parts of an image that cause Object Hallucination based on Attention Maps and gradient information within the LVLMs, and then makes corrections. Through experiments, we have verified that our proposed method reduces the occurrence of Object Hallucination using multiple quantitative metrics.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password