2020年第67回応用物理学会春季学術講演会

講演情報

一般セッション(口頭講演)

合同セッションN「インフォマティクス応用」 » 23.1 合同セッションN「インフォマティクス応用」

[15a-A205-1~10] 23.1 合同セッションN「インフォマティクス応用」

2020年3月15日(日) 09:30 〜 12:15 A205 (6-205)

桂 ゆかり(東大)、知京 豊裕(物材機構)

10:30 〜 10:45

[15a-A205-5] SuperMat: Corpus for Extraction of Superconductor Materials Data

Luca Foppiano1、Sae Dieb1、Akira Suzuki1、Kensei Terashima2、Pedro Baptista de Castro2、Iwasaki Suguru2、Yoshihiko Takano2、Masashi Ishii1 (1.Material Database Group, MaDIS, NIMS、2.Nano Frontier Superconducting Materials Group, NIMS)

キーワード:superconductors, corpus construction, text mining

The automatic collection of material information from research papers using Machine Learning (ML) and Natural Language Processing (NLP) is a milestone to establish a sustainable approach for creating or enriching domain-specific databases.
In the field of superconductors materials, the manual data collection used to populate SuperCon cannot cope with the massive fresh information from the increasing number of articles published every year. For this reason, an inter-disciplinary project is currently ongoing, which aims to develop a system to extract superconductors materials and related properties from scientific literature automatically (Foppiano et all, 2019).
Unfortunately, in this unexplored terrain, there is no record of previous attempts in the scientific literature, nor existing datasets in the public domain. In this submission, we present our work and the methodology used for creating a superconductor material dataset: SuperMat, in collaboration with the Nano Frontier Superconducting Material Group.
Currently, we have annotated and validated 60 papers, with entities and relationship information (links). This corpus is designed for training sequence labelling statistical models and can be utilised for developing domain-specific systems for entity extraction, entity-relationship and clustering.