SuperMat: Corpus for Extraction of Superconductor Materials Data

Luca Foppiano; Sae Dieb; Akira Suzuki; Kensei Terashima; Pedro Baptista de Castro; Iwasaki Suguru; Yoshihiko Takano; Masashi Ishii

10:30 〜 10:45

▲ [15a-A205-5] SuperMat: Corpus for Extraction of Superconductor Materials Data

〇Luca Foppiano¹、Sae Dieb¹、Akira Suzuki¹、Kensei Terashima²、Pedro Baptista de Castro²、Iwasaki Suguru²、Yoshihiko Takano²、Masashi Ishii¹ (1.Material Database Group, MaDIS, NIMS、2.Nano Frontier Superconducting Materials Group, NIMS)

キーワード：superconductors, corpus construction, text mining

The automatic collection of material information from research papers using Machine Learning (ML) and Natural Language Processing (NLP) is a milestone to establish a sustainable approach for creating or enriching domain-specific databases.
In the field of superconductors materials, the manual data collection used to populate SuperCon cannot cope with the massive fresh information from the increasing number of articles published every year. For this reason, an inter-disciplinary project is currently ongoing, which aims to develop a system to extract superconductors materials and related properties from scientific literature automatically (Foppiano et all, 2019).
Unfortunately, in this unexplored terrain, there is no record of previous attempts in the scientific literature, nor existing datasets in the public domain. In this submission, we present our work and the methodology used for creating a superconductor material dataset: SuperMat, in collaboration with the Nano Frontier Superconducting Material Group.
Currently, we have annotated and validated 60 papers, with entities and relationship information (links). This corpus is designed for training sequence labelling statistical models and can be utilised for developing domain-specific systems for entity extraction, entity-relationship and clustering.

講演情報

[15a-A205-1~10] 23.1 合同セッションN「インフォマティクス応用」

▲ [15a-A205-5] SuperMat: Corpus for Extraction of Superconductor Materials Data