9:00 AM - 9:20 AM
[4A1-GS-6-01] OCR Text Analysis for Unstructured Document Images with Tuned Large Language Model
[[Online]]
Keywords:OCR, LLM, Relation Extraction
Optical Character Recognition (OCR) is a technology to recognize characters from given images. OCR is expected to realize a wide variety of document digitalization as its accuracy has been continuously improving with the advent of deep learning. For example, OCR may help clients to prepare submission data for an information site by applying it to stored document images. For user convenience, this task requires not just enumerating OCR outputs but organizing them into structured data according to the extracted relations between keys and values. With a table structure, the relation will be extracted easily based on text-positional information. However, in many real-world cases, document images consist of unstructured text data and this relation should be extracted based on their context, which prevents OCR from taking into practical use. In recent years, much attention has been attracted to large language models (LLMs) and is expected to dramatically improve the accuracy of context understanding. In this study, we propose the relation-extraction method by applying LLM to the OCR outputs. Through OCR post-processing and fine-tuning of LLM, we succeeded in the high accuracy of the relation-extraction in document images with less table structures.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.