JSAI2019

Presentation information

General Session

General Session » [GS] J-9 Natural language processing, information retrieval

[1N4-J-9] Natural language processing, information retrieval: domain knowledge analysis

Tue. Jun 4, 2019 5:20 PM - 6:40 PM Room N (Front-right room of 1F Exhibition hall)

Chair:Tomoko Okuma Reviewer:Kugatsu Sadamitsu

6:00 PM - 6:20 PM

[1N4-J-9-03] Using Subword Sequence BiLSTM-CRF Model for Compound Name Extraction

〇Go Urasawa1,3, Hiroto Sekine1,3, Takashi Inui1,3, Tomoya Iwakura2,3 (1. University of Tsukuba, 2. Fujitsu Laboratories, 3. RIKEN AIP-FUJITSU Collaboration Center)

Keywords:Compound Name Extraction, Subword

In this paper, we investigate of using subword sequences for compound name extraction problem. Five variety of subword sequence generators (SYMBOL, SP, BPE, BPE-DICT, and BPE-PMI) were used in the investigation. Last two of these, BPE-DICT and BPE-PMI, are originally proposed in this work. BPE-DICT is a variation of BPE which has a dictionary-based restriction. BPE-PMI introduces the PMI measure instead of word frequency count. The experimental results showed that subword sequence information improved the extraction performance. The F-measure value of BPE-DICT is 86.74 which is best score in all conditions of our experiments.