Toward full automatic identification of superconducting materials and their properties in original papers: ambitious scope and current status

Luca Foppiano; Sae Dieb; Akira Suzuki; Pedro Baptista de Castro; Yan Meng; Kensei Terashima; Yoshihiko Takano; Masashi Ishii

4:45 PM - 5:00 PM

▲ [9p-Z09-14] Toward full automatic identification of superconducting materials and their properties in original papers: ambitious scope and current status

〇Luca Foppiano¹, Sae Dieb¹, Akira Suzuki¹, Pedro Baptista de Castro², Yan Meng², Kensei Terashima², Yoshihiko Takano², Masashi Ishii¹ (1.Material Database Group, MaDIS, NIMS, 2.Nano Frontier Superconducting Materials Group, MANA, NIMS)

Keywords:materials informatics, machine learning, text mining

The National Institute for Materials Science (NIMS) is developing Text and Data mining (TDM) processes toward fully automatic database construction. Ideally, the system extracts an output dataset of materials and related properties from sets of scientific articles. In this presentation, we discuss the current status and the challenges of this ambitious work, applied to the superconductors domain. Our pipeline is composed of two steps: "Extraction", a Machine Learning based entity recognition, and "Linking", a relationship identifier, implemented by combining heuristic and sequence labelling. We processed a dataset of 500 articles on superconductivity and extract 600 links (materials - superconducting critical temperature). Manual correction confirmed that the system achieves an F1-score of 70% (Precision: 73%, Recall: 67%). We applied the system to articles in Journal of Superconductivity (IOP) published from 2015 to 2018, and 850 links were obtained from 1088 papers. We then discuss our results and the roadmap for future development.

Presentation information

[9p-Z09-1~18] 23.1 Joint Session N "Informatics"

▲ [9p-Z09-14] Toward full automatic identification of superconducting materials and their properties in original papers: ambitious scope and current status