JSAI2019

Presentation information

General Session

General Session » [GS] J-9 Natural language processing, information retrieval

[1N4-J-9] Natural language processing, information retrieval: domain knowledge analysis

Tue. Jun 4, 2019 5:20 PM - 6:40 PM Room N (Front-right room of 1F Exhibition hall)

Chair:Tomoko Okuma Reviewer:Kugatsu Sadamitsu

5:40 PM - 6:00 PM

[1N4-J-9-02] Adding Multiple Subword Sequences to BiLSTM-CRF Model for Compound Name Extraction

〇Hiroto Sekine1, Go Urasawa1, Takashi Inui1, Tomoya Iwakura2 (1. Tsukuba Univ. / Riken AIP Fujitsu center, 2. Riken AIP Fujitsu center)

Keywords:Named Entity Recognition, Deep Learning, Subword

In this paper, we propose a BiLSTM-CRF model for extracting compound names from documents in chemical domain. The proposed model can be taken multiple subword sequences as input in order to obtain sufficient features for long span or unknown tokens. Subword LSTM units with contextual information are introduced in the input layer of the model. We conducted experiments based on CHEMDNER challenge to investigate the effectiveness of the model. As a result, the extraction accuracy outperformed the normal BiLSTM-CRF model, and experimental results on unknown words showed that the proposed method works better.