JSAI2025

Presentation information

General Session

General Session » GS-10 AI application

[4A3-GS-10] AI application:

Fri. May 30, 2025 2:00 PM - 3:40 PM Room A (Large hall)

座長:石川 翔吾(静岡大学)

3:00 PM - 3:20 PM

[4A3-GS-10-04] Development of a Structural Analysis Model for Laboratory Equipment Manuals and a PDF-to-Markdown Conversion Pipeline Integrated with OCR and VLM for Laboratory RAG

〇Yukito Nonaka1, Naoto Yokoi2,3, Junji haruyama1,4, Eiji Saitoh2,3,5,6 (1. College of Science and Engineering, Aoyama Gakuin University, 2. School of Engineering, Univ. of Tokyo, 3. Institute for AI and Beyond, Univ. of Tokyo, 4. Institute of Industrial Science, Univ. of Tokyo, 5. Advanced Institute for Materials Research, Tohoku Univ., 6. Center for Emergent Matter Science, RIKEN)

Keywords:Document Layout Analysis, PDF Conversion, Laboratory automation

In the development of laboratory automation and laboratory RAG (Retrieval-Augmented Generation) systems, it is crucial to convert PDFs, such as laboratory equipment manuals, into formats that are easier for large language models (LLMs) to process. However, document layout analysis services based on deep learning often misinterpret human-optimized elements, such as UI screens and button images, treating them as tables or text. Additionally, binary analysis of PDFs is frequently inadequate due to the diverse ways PDFs are generated.

To address these challenges, this study fine-tuned DocLayout-YOLO, pre-trained on DocSynth300K, using a dataset of 35,535 PDFs and annotations generated from HTML manuals for home appliances and electronic devices. The developed model achieves high accuracy in detecting UI and button images, even in complex cases. Furthermore, we propose a PDF analysis pipeline that integrates OCR and VLM to extract text, images, and structural information, converting the data into Markdown format.

This work not only improves the efficiency of manual organization and reference but also provides a robust technological foundation for document processing in various fields.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password