11:15 AM - 11:30 AM
[22a-M206-9] Construction of Superconductivity Database by Text Data Mining and Machine Learning Ⅱ
Keywords:superconductivity, machine learning, data mining
Machine learning has been used as one of the methods to more efficiently search for superconductors with high superconducting critical temperatures (Tc). Although the NIMS superconductivity database SuperCon is available for superconductors under ambient pressure, superconductivity under pressure must be done from the collection of training data for machine learning. We have worked in developing grobid-superconductors, a process for data mining the composition, Tc, and pressure of superconducting materials from a large amount of document data. However, when we actually checked the mining data, we found that not a few of these data were not suitable for use in machine learning. Mining data errors that the current grobid-superconductors tends to make can be roughly divided into three categories: (1) Linking : Composition, superconducting critical temperature, and pressure, extracted incorrectly associated. The composition was not extracted with the appropriate chemical formula. (2) Extraction : Composition, Tc, and pressure that should have been extracted but were not. (3) Tc classification : Extracted Néel temperatures, Curie temperatures or synthesis temperatures that should not have been extracted. Data with composition, Tc, and pressure tied to each other before and after cleansing were created, and a single regression analysis with random forest using composition descriptors was performed to create a Tc prediction model. From the comparison of both before and after cleansing, the importance of cleansing when text data mining is used for machine learning of materials systems will be discussed. In this presentation, I will also introduce the tendency of texts in which the phenomenon requiring cleansing occurs.