2:40 PM - 3:00 PM
[4I3-GS-7-03] Detection and Correction of Object Hallucination using Attention Map and Gradient Information in LVLMs
Keywords:Object Hallucination, multimodal, Large Vision-Language Models
Inspired by the superior language processing capabilities of Large Language Models (LLMs), there has been a recent push to develop Large Vision Language Models (LVLMs) that incorporate powerful LLMs to enhance performance on complex multimodal tasks. However, these LVLMs face issues with Object Hallucination, where they inaccurately recognize and describe objects that do not exist in the image or misrepresent the relationships between objects.
To address this problem, we propose a framework that detects and corrects Object Hallucination. This framework identifies and detects the specific parts of an image that cause Object Hallucination based on Attention Maps and gradient information within the LVLMs, and then makes corrections. Through experiments, we have verified that our proposed method reduces the occurrence of Object Hallucination using multiple quantitative metrics.
To address this problem, we propose a framework that detects and corrects Object Hallucination. This framework identifies and detects the specific parts of an image that cause Object Hallucination based on Attention Maps and gradient information within the LVLMs, and then makes corrections. Through experiments, we have verified that our proposed method reduces the occurrence of Object Hallucination using multiple quantitative metrics.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.