JSAI2024

Presentation information

General Session

General Session » GS-5 Language media processing

[4A1-GS-6] Language media processing:

Fri. May 31, 2024 9:00 AM - 10:40 AM Room A (Main hall)

座長:田中 駿(JX通信社)

9:00 AM - 9:20 AM

[4A1-GS-6-01] OCR Text Analysis for Unstructured Document Images with Tuned Large Language Model

〇Hyakka Nakada1, Takashi Egami1, Rinka Fukuji2, Marika Kubota2, Masakazu Yakushiji1 (1. Recruit Co., Ltd., 2. Beans Labo Co., Ltd.)

[[Online]]

Keywords:OCR, LLM, Relation Extraction

Optical Character Recognition (OCR) is a technology to recognize characters from given images. OCR is expected to realize a wide variety of document digitalization as its accuracy has been continuously improving with the advent of deep learning. For example, OCR may help clients to prepare submission data for an information site by applying it to stored document images. For user convenience, this task requires not just enumerating OCR outputs but organizing them into structured data according to the extracted relations between keys and values. With a table structure, the relation will be extracted easily based on text-positional information. However, in many real-world cases, document images consist of unstructured text data and this relation should be extracted based on their context, which prevents OCR from taking into practical use. In recent years, much attention has been attracted to large language models (LLMs) and is expected to dramatically improve the accuracy of context understanding. In this study, we propose the relation-extraction method by applying LLM to the OCR outputs. Through OCR post-processing and fine-tuning of LLM, we succeeded in the high accuracy of the relation-extraction in document images with less table structures.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password