Performance Evaluation of Multimodal LLM with Life Insurance Business Data

Tamao Shimizu

6:00 PM - 6:20 PM

[3A6-GS-10-02] Performance Evaluation of Multimodal LLM with Life Insurance Business Data

〇Tamao Shimizu¹, Hibiki Bannai¹, Yoshiaki Onishi¹ (1. The Dai-ichi Life Techno Cross Co., Ltd.)

Keywords:Industrial Application

In order to apply multimodal LLM to a life insurance company's inquiry response task, we constructed a benchmark using business data to compare and evaluate the actual performance of multiple models. We evaluated three models, Claude 3.5 Sonnet, Gemini 1.5 Pro, and GPT-4o, focusing on document QA and textualization of image content tasks. As a result, Claude 3.5 Sonnet showed the highest accuracy in document QA, and Gemini 1.5 Pro showed the highest accuracy in the image content text conversion task. In addition, we identified the characteristics of charts and tables in in-house documents that were difficult for LLM to recognize. Through these evaluations, we confirmed that benchmarking using business data yields results that are different from those obtained by general-purpose benchmarks that are publicly available.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3A6-GS-10] AI application:

[3A6-GS-10-02] Performance Evaluation of Multimodal LLM with Life Insurance Business Data

Password