Keywords:Materials Informatics, Text Mining, Named Entity Recognition
In the field of inorganic materials, efforts are being made to discover materials with high properties in a short time by referring to statistical data that links synthetic material names with their property values. However, there are few large-scale databases that link synthetic material names with their property values. In this study, we focus on the extraction of information from academic papers. We first improve the existing annotation scheme for extracting the synthetic material names of batteries described in natural language in papers by adding a new property value label to extract material names and their property values simultaneously. By using our annotation scheme, we built a corpus which includes 836 paragraphs extracted from 301 papers to train a named entity extraction model. The evaluation results show that our named entity extraction model has high extraction performance. In addition, we extracted pairs of synthetic material names and property values from 24,415 material articles using the named entity extraction model. Finally, the extraction results are visualized in a simple and the trend of the materials in each period is discussed, demonstrating the usefulness of a large-scale database consisting of the pairs of synthetic processes and their property values.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.