Using Subword Sequence BiLSTM-CRF Model for Compound Name Extraction

Go Urasawa

6:00 PM - 6:20 PM

[1N4-J-9-03] Using Subword Sequence BiLSTM-CRF Model for Compound Name Extraction

〇Go Urasawa^1,3, Hiroto Sekine^1,3, Takashi Inui^1,3, Tomoya Iwakura^2,3 (1. University of Tsukuba, 2. Fujitsu Laboratories, 3. RIKEN AIP-FUJITSU Collaboration Center)

Keywords:Compound Name Extraction, Subword

In this paper, we investigate of using subword sequences for compound name extraction problem. Five variety of subword sequence generators (SYMBOL, SP, BPE, BPE-DICT, and BPE-PMI) were used in the investigation. Last two of these, BPE-DICT and BPE-PMI, are originally proposed in this work. BPE-DICT is a variation of BPE which has a dictionary-based restriction. BPE-PMI introduces the PMI measure instead of word frequency count. The experimental results showed that subword sequence information improved the extraction performance. The F-measure value of BPE-DICT is 86.74 which is best score in all conditions of our experiments.

Presentation information

[1N4-J-9] Natural language processing, information retrieval: domain knowledge analysis

[1N4-J-9-03] Using Subword Sequence BiLSTM-CRF Model for Compound Name Extraction