12:00 PM - 12:20 PM
[4N2-GS-7-01] Enhancing Insights into Traffic Accident Risk with Multimodal Retrieval-Augmented Generation
Keywords:Multimodal, RAG, VLM
With the advancement of autonomous driving technologies, accurately estimating traffic accident risks has become increasingly important. Advanced Driver-Assistance Systems (ADAS), which use sensors for obstacle detection and provide collision mitigation and evasive steering, have reduced traffic accidents. However, the diversity and complexity of accident scenarios limit the potential of traditional ADAS technologies to achieve further reductions in accidents. Recently, Vision-Language Model (VLM) has been applied to the autonomous driving field. While VLMs possess broad knowledge and achieve reasonable accuracy in traffic scene understanding, they struggle to evaluate accident risks involving detailed and complex factors. Fine-tuning VLM specialized for traffic accident risk estimation is necessary, but the significant cost of data collection and annotation poses practical challenges. This study proposes a traffic accident risk explanation method using multimodal Retrieval-Augmented Generation (RAG) to improve explanation performance efficiently with minimal data. By leveraging a small amount of manually annotated data for retrieval and reference, the proposed method enhances explanatory capabilities for previously unseen images. Experimental results show that the proposed method improves traffic accident scene understanding compared to baseline model.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.