JSAI2025

Presentation information

General Session

General Session » GS-10 AI application

[4O1-GS-10] AI application:

Fri. May 30, 2025 9:00 AM - 10:40 AM Room O (Room 1010)

座長:長谷川 忍(北陸先端科学技術大学院大学)

9:20 AM - 9:40 AM

[4O1-GS-10-02] Investigation of the Impact of Source Document Types in RAG Systems Incorporating Documents with Mathematical Formulas

〇Hayate Funakura1,2,3, Kaede Mori3 (1. Kyoto University, 2. Keio University, 3. Kikagaku, Inc.)

Keywords:RAG, Large language model, AI for education

Many studies have investigated educational support methods based on RAG systems that answer learners’ questions by referring to instructional texts. In mathematics and related fields, such RAG systems are also expected to facilitate learning. However, it remains unclear which document type (PDF, Markdown, etc.) is optimal for building RAG systems when dealing with texts containing mathematical expressions. In this paper, as an attempt to identify the best document format for RAG in math-heavy contexts, we compare the performance of RAG systems using source texts in PDF format versus those in Markdown format. To evaluate performance, we prepared questions that require understanding of both the mathematical expressions and their surrounding context. We then built and evaluated an RAG system that retrieves relevant text from the source document to answer these questions. Our results suggest that PDF format offers advantages in terms of robustness to the choice of text embedding model.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password