The 81st JSAP Autumn Meeting, 2020

Presentation information

Oral presentation

23 Joint Session N "Informatics" » 23.1 Joint Session N "Informatics"

[9p-Z09-1~18] 23.1 Joint Session N "Informatics"

Wed. Sep 9, 2020 1:00 PM - 6:00 PM Z09

Kiyou Shibata(the University of Tokyo), Masato Kotsugi(Tokyo Univ. of Sci.), Shigetaka Tomiya(SONY Corp.)

4:45 PM - 5:00 PM

[9p-Z09-14] Toward full automatic identification of superconducting materials and their properties in original papers: ambitious scope and current status

Luca Foppiano1, Sae Dieb1, Akira Suzuki1, Pedro Baptista de Castro2, Yan Meng2, Kensei Terashima2, Yoshihiko Takano2, Masashi Ishii1 (1.Material Database Group, MaDIS, NIMS, 2.Nano Frontier Superconducting Materials Group, MANA, NIMS)

Keywords:materials informatics, machine learning, text mining

The National Institute for Materials Science (NIMS) is developing Text and Data mining (TDM) processes toward fully automatic database construction. Ideally, the system extracts an output dataset of materials and related properties from sets of scientific articles. In this presentation, we discuss the current status and the challenges of this ambitious work, applied to the superconductors domain. Our pipeline is composed of two steps: "Extraction", a Machine Learning based entity recognition, and "Linking", a relationship identifier, implemented by combining heuristic and sequence labelling. We processed a dataset of 500 articles on superconductivity and extract 600 links (materials - superconducting critical temperature). Manual correction confirmed that the system achieves an F1-score of 70% (Precision: 73%, Recall: 67%). We applied the system to articles in Journal of Superconductivity (IOP) published from 2015 to 2018, and 850 links were obtained from 1088 papers. We then discuss our results and the roadmap for future development.