The 70th JSAP Spring Meeting 2023

Presentation information

Poster presentation

23 Joint Session N "Informatics" » 23.1 Joint Session N "Informatics"

[17a-PB02-1~9] 23.1 Joint Session N "Informatics"

Fri. Mar 17, 2023 9:30 AM - 11:30 AM PB02 (Poster)

9:30 AM - 11:30 AM

[17a-PB02-8] Machine extraction of material data from tables in PDF articles

Hiroyuki Oka1, Masashi Ishii1 (1.NIMS)

Keywords:material data, table, PDF

A system for machine extraction of material data from tables in PDF articles was created. The tables in a PDF article were first extracted using the existing tools, and the material and property names in them were then recognized to extract the triples of material-name, property-name, and value as the material data points. The recognizer for material-name recognition was created using machine learning for sentence classification. The recognizer could recognize not only general material names also sample labels and IDs defined by authors, and its f1 score was 0.89. The tools and methods used in creating the system and the results will be reported in detail.