The 67th JSAP Spring Meeting 2020

Presentation information

Oral presentation

Joint Session N "Informatics" » 23.1 Joint Session N "Informatics"

[15a-A205-1~10] 23.1 Joint Session N "Informatics"

Sun. Mar 15, 2020 9:30 AM - 12:15 PM A205 (6-205)

Yukari Katsura(Univ. of Tokyo), Toyohiro Chikyo(NIMS)

10:30 AM - 10:45 AM

[15a-A205-5] SuperMat: Corpus for Extraction of Superconductor Materials Data

Luca Foppiano1, Sae Dieb1, Akira Suzuki1, Kensei Terashima2, Pedro Baptista de Castro2, Iwasaki Suguru2, Yoshihiko Takano2, Masashi Ishii1 (1.Material Database Group, MaDIS, NIMS, 2.Nano Frontier Superconducting Materials Group, NIMS)

Keywords:superconductors, corpus construction, text mining

The automatic collection of material information from research papers using Machine Learning (ML) and Natural Language Processing (NLP) is a milestone to establish a sustainable approach for creating or enriching domain-specific databases.
In the field of superconductors materials, the manual data collection used to populate SuperCon cannot cope with the massive fresh information from the increasing number of articles published every year. For this reason, an inter-disciplinary project is currently ongoing, which aims to develop a system to extract superconductors materials and related properties from scientific literature automatically (Foppiano et all, 2019).
Unfortunately, in this unexplored terrain, there is no record of previous attempts in the scientific literature, nor existing datasets in the public domain. In this submission, we present our work and the methodology used for creating a superconductor material dataset: SuperMat, in collaboration with the Nano Frontier Superconducting Material Group.
Currently, we have annotated and validated 60 papers, with entities and relationship information (links). This corpus is designed for training sequence labelling statistical models and can be utilised for developing domain-specific systems for entity extraction, entity-relationship and clustering.