2019年第80回応用物理学会秋季学術講演会

講演情報

一般セッション(口頭講演)

合同セッションN「インフォマティクス応用」 » 23.1 合同セッションN「インフォマティクス応用」

[19a-B01-1~10] 23.1 合同セッションN「インフォマティクス応用」

2019年9月19日(木) 09:00 〜 11:45 B01 (オープンホール)

寺崎 正(産総研)、知京 豊裕(物材機構)

11:00 〜 11:15

[19a-B01-8] Leveraging Segmentation of Physical Units through a Newly Open Source Corpus

Luca Foppiano1、Akira Suzuki1、Thaer M. Dieb1、Masashi Ishii1、Mikiko Tanifuji1 (1.MaDIS, NIMS)

キーワード:units of measurement, physical quantities, corpora

The identification of physical measurements is a recurrent task in material informatics (MI).
When designing automatic systems for information extraction from scientific literature, the identification of the raw measurement alone is not sufficient. Quantity transformations, such as normalisation, require the understanding of values and units, which are contained in unstructured text with ad-hoc conventions.
This contribution is part of a larger project called Grobid-quantities, a machine learning (ML) based, Open Source system for extracting and normalising physical measurements from scientific and patent literature.
In this submission, we present a general approach for units representation, and we introduce the public availability (Creative Commons licence) of a corpus of segmented physical units, comprising about 2000 entries, available in XML format and suitable for evaluation and to compare different unit measurement segmentation systems.