The 82nd JSAP Autumn Meeting 2021

Presentation information

Oral presentation

23 Joint Session N "Informatics" » 23.1 Joint Session N "Informatics"

[11a-N107-1~11] 23.1 Joint Session N "Informatics"

Sat. Sep 11, 2021 9:00 AM - 12:00 PM N107 (Oral)

Kentaro Kutsukake(RIKEN), Yukari Katsura(NIMS)

11:00 AM - 11:15 AM

[11a-N107-8] Efficient workflow for automatic database creation from large scale scientific articles

Luca Foppiano1, Pedro Baptista De Castro2, Kensei Terashima2, Yoshihiko Takano2, Masashi Ishii1 (1.MaDIS, NIMS, 2.MANA, NIMS)

Keywords:material informatics, superconductors, machine learning

The creation of automatically extracted databases of materials and properties from the scientific literature is the building block for data-driven materials science (Materials Informatics).
We have developed a service combining Apache Airflow (workflow engine) and custom made tasks to use our superconductors extractor service for build- ing a database of superconductors materials and properties from PDF documents of scientific articles.
We have processed about 250000 PDF documents from journals in materials science from various publishers, including APS, IOP and Elsevier and obtained a database of nearly 12000 entries of superconductors materials and linked properties.