The 80th JSAP Autumn Meeting 2019

Presentation information

Oral presentation

Joint Session N » 23.1 Joint Session N "Informatics"

[19a-B01-1~10] 23.1 Joint Session N "Informatics"

Thu. Sep 19, 2019 9:00 AM - 11:45 AM B01 (B01)

Nao Terasaki(AIST), Toyohiro Chikyo(NIMS)

11:00 AM - 11:15 AM

[19a-B01-8] Leveraging Segmentation of Physical Units through a Newly Open Source Corpus

Luca Foppiano1, Akira Suzuki1, Thaer M. Dieb1, Masashi Ishii1, Mikiko Tanifuji1 (1.MaDIS, NIMS)

Keywords:units of measurement, physical quantities, corpora

The identification of physical measurements is a recurrent task in material informatics (MI).
When designing automatic systems for information extraction from scientific literature, the identification of the raw measurement alone is not sufficient. Quantity transformations, such as normalisation, require the understanding of values and units, which are contained in unstructured text with ad-hoc conventions.
This contribution is part of a larger project called Grobid-quantities, a machine learning (ML) based, Open Source system for extracting and normalising physical measurements from scientific and patent literature.
In this submission, we present a general approach for units representation, and we introduce the public availability (Creative Commons licence) of a corpus of segmented physical units, comprising about 2000 entries, available in XML format and suitable for evaluation and to compare different unit measurement segmentation systems.