2021年第82回応用物理学会秋季学術講演会

講演情報

一般セッション(口頭講演)

23 合同セッションN「インフォマティクス応用」 » 23.1 合同セッションN「インフォマティクス応用」

[11a-N107-1~11] 23.1 合同セッションN「インフォマティクス応用」

2021年9月11日(土) 09:00 〜 12:00 N107 (口頭)

沓掛 健太朗(理研)、桂 ゆかり(物材機構)

11:00 〜 11:15

[11a-N107-8] Efficient workflow for automatic database creation from large scale scientific articles

Luca Foppiano1、Pedro Baptista De Castro2、Kensei Terashima2、Yoshihiko Takano2、Masashi Ishii1 (1.MaDIS, NIMS、2.MANA, NIMS)

キーワード:material informatics, superconductors, machine learning

The creation of automatically extracted databases of materials and properties from the scientific literature is the building block for data-driven materials science (Materials Informatics).
We have developed a service combining Apache Airflow (workflow engine) and custom made tasks to use our superconductors extractor service for build- ing a database of superconductors materials and properties from PDF documents of scientific articles.
We have processed about 250000 PDF documents from journals in materials science from various publishers, including APS, IOP and Elsevier and obtained a database of nearly 12000 entries of superconductors materials and linked properties.