JSAI2024

Presentation information

General Session

General Session » GS-10 AI application

[3F1-GS-10] AI application: Language model

Thu. May 30, 2024 9:00 AM - 10:40 AM Room F (Temporary room 4)

座長:水本 智也(LINEヤフー/SB Intuitions)

9:00 AM - 9:20 AM

[3F1-GS-10-01] Exploration for the adaptation of multimodal models to civil engineering documents.

〇Riku Ogata1, Junichi Okubo1, Junichiro Fujii1 (1. Yachiyo Engineering Co., Ltd.)

Keywords:document understanding, multimodal models, civil engineering

The digitalization of information such as specifications and inspection information, which were previously managed on paper, is now in progress for the purpose of improving the work efficiency and labor saving of civil engineers. On the other hand, many documents in the civil engineering field are in pdf format and come in a variety of formats. In some cases, scanned data of old documents are used as references, which cannot be handled by text extraction tools or optical character recognition (OCR) technology. In recent years, multimodal models have been used for OCR and document understanding, and it is expected that multimodal models will be used in the civil engineering field as well. In this study, we measure how well multimodal models can recognize and understand documents in the field of civil engineering that contain many technical terms and are written in Japanese. We also conduct a qualitative analysis and discuss the possibility of using multimodal models in the field of civil engineering.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password