JSAI2024

Presentation information

General Session

General Session » GS-5 Language media processing

[4N3-GS-6] Language media processing:

Fri. May 31, 2024 2:00 PM - 3:40 PM Room N (Room 54)

座長:田中涼太(NTT人間情報研究所)

2:00 PM - 2:20 PM

[4N3-GS-6-01] Evolutionary Computation-based Automatic Prompt Engineering for OCR Text Analysis of Unstructured Document Images

〇Takashi Egami1, Hyakka Nakada1, Rinka Fukuji2, Marika Kubota2, Masakazu Yakushiji1 (1. Recruit Co., Ltd., 2. Beans Labo Co., Ltd.)

Keywords:OCR, LLM, Relation Extraction, Prompt Engineering, Evolutionary Computation

Optical Character Recognition (OCR) is a technology that recognizes characters from images, potentially reducing the man-hours required to publish store information from document images on websites. However, this process requires not just extracting characters but also extracting key-value relations. Such a task is straightforward in tabular documents but difficult in unstructured ones because of various formats. Advances in Large Language Models (LLMs) have improved text comprehension, and the accuracy is reported to be enhanced by automated prompt engineering, which is designed to generate task-specific prompts. Applying this approach to OCR is expected to improve the relation extraction accuracy. However, particularly in unstructured documents, it requires many times of inferences to learn large amounts of formats. This leads to expensive computational cost. Thus, to optimize LLM prompts with fewer inferences, the application of minibatch learning to evolutionary computation-based automated prompt engineering is proposed. The optimized prompts were found to extract key-value relations from OCR data with high precision.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password