2020年第81回応用物理学会秋季学術講演会

講演情報

一般セッション(口頭講演)

23 合同セッションN「インフォマティクス応用」 » 23.1 合同セッションN「インフォマティクス応用」

[9p-Z09-1~18] 23.1 合同セッションN「インフォマティクス応用」

2020年9月9日(水) 13:00 〜 18:00 Z09

柴田 基洋(東大)、小嗣 真人(東理大)、冨谷 茂隆(ソニー)

16:45 〜 17:00

[9p-Z09-14] Toward full automatic identification of superconducting materials and their properties in original papers: ambitious scope and current status

Luca Foppiano1、Sae Dieb1、Akira Suzuki1、Pedro Baptista de Castro2、Yan Meng2、Kensei Terashima2、Yoshihiko Takano2、Masashi Ishii1 (1.Material Database Group, MaDIS, NIMS、2.Nano Frontier Superconducting Materials Group, MANA, NIMS)

キーワード:materials informatics, machine learning, text mining

The National Institute for Materials Science (NIMS) is developing Text and Data mining (TDM) processes toward fully automatic database construction. Ideally, the system extracts an output dataset of materials and related properties from sets of scientific articles. In this presentation, we discuss the current status and the challenges of this ambitious work, applied to the superconductors domain. Our pipeline is composed of two steps: "Extraction", a Machine Learning based entity recognition, and "Linking", a relationship identifier, implemented by combining heuristic and sequence labelling. We processed a dataset of 500 articles on superconductivity and extract 600 links (materials - superconducting critical temperature). Manual correction confirmed that the system achieves an F1-score of 70% (Precision: 73%, Recall: 67%). We applied the system to articles in Journal of Superconductivity (IOP) published from 2015 to 2018, and 850 links were obtained from 1088 papers. We then discuss our results and the roadmap for future development.