[1Win4-38] Generating Captions to Explain Logical Anomaly Detection in Industrial Images
Keywords:Industrial Images, Multimodal Model, Anomaly Detection, Image Captioning
This study aims to utilize large multimodal models to detect logical anomalies in industrial images and generate accurate descriptive outputs. Traditional anomaly detection methods mainly rely on image feature extraction and pattern recognition, making it difficult to effectively capture complex and subtle logical inconsistencies in industrial scenarios. Moreover, after classifying an image as anomalous, these methods cannot accurately describe the anomaly. To address this issue, this study integrates visual and textual data through multimodal learning techniques to enhance detection performance. Based on the existing dataset, I added corresponding descriptions to each image and adjusted the ratio of anomalous to normal images during model training to improve the model's anomaly detection performance. The results show that the trained model can not only effectively identify anomalous images but also generate descriptive outputs of the anomalies.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.