JSAI2020

Presentation information

Interactive Session

[3Rin4] Interactive 1

Thu. Jun 11, 2020 1:40 PM - 3:20 PM Room R01 (jsai2020online-2-33)

[3Rin4-72] Initiatives for the Development of Technology and Construction of Datasets to Enhance Searchability and Retrieval of Digitized Materials at the National Diet Library

〇Toru Aoike1, Takahumi Kinoshita1, Wataru Satomi1, Takanori Kawashima1 (1.National Diet Library)

National Diet Library holds the copyright to this paper.

Keywords:Document Layout Analysis, Dataset, OCR, Library Materials

The National Diet Library is conducting research on layout analysis and character recognition of digitized materials for the purpose of producing high-quality text from materials that are difficult to read with existing OCR software, such as printed materials that have aged. The layout dataset constructed during our study has been made available to the public under a free license (https://github.com/ndl-lab/layout-dataset). In this paper, we introduce the published datasets and annotation tools and quantitatively evaluate the machine learning method used to semi-automate the creation of datasets. Finally, we discuss potential topics for future study using this dataset.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password