4:45 PM - 5:00 PM
▲ [9p-Z09-14] Toward full automatic identification of superconducting materials and their properties in original papers: ambitious scope and current status
Keywords:materials informatics, machine learning, text mining
The National Institute for Materials Science (NIMS) is developing Text and Data mining (TDM) processes toward fully automatic database construction. Ideally, the system extracts an output dataset of materials and related properties from sets of scientific articles. In this presentation, we discuss the current status and the challenges of this ambitious work, applied to the superconductors domain. Our pipeline is composed of two steps: "Extraction", a Machine Learning based entity recognition, and "Linking", a relationship identifier, implemented by combining heuristic and sequence labelling. We processed a dataset of 500 articles on superconductivity and extract 600 links (materials - superconducting critical temperature). Manual correction confirmed that the system achieves an F1-score of 70% (Precision: 73%, Recall: 67%). We applied the system to articles in Journal of Superconductivity (IOP) published from 2015 to 2018, and 850 links were obtained from 1088 papers. We then discuss our results and the roadmap for future development.