JSAI2020

Presentation information

General Session

General Session » J-9 Natural language processing, information retrieval

[1E3-GS-9] Natural language processing, information retrieval: Machine learning

Tue. Jun 9, 2020 1:20 PM - 3:00 PM Room E (jsai2020online-5)

座長:石畠正和(NTT)

2:00 PM - 2:20 PM

[1E3-GS-9-03] Automatic punctuation completion using LSTM

〇Katsuhiko Utsubo1 (1. Kyodo News)

Keywords:NLP, LSTM

In Japanese sentences, the meaning of the context may differ depending on the insertion point of punctuation, so the position of the punctuation is very important.
In this research, we create a general method that automatically complements punctuation from text information using deep learning.
The proposed method is that the corpus is split using morphological analysis and replaced infrequent words with parts of speech and performs classification of exists of a period or comma using LSTM from word strings before and after the target position.
The accuracy of classification has been improved by setting a threshold for the probability output by the model.
Furthermore, by limiting the number of input words and replacing them with parts of speech, the calculation time can be reduced without reducing the calculation accuracy.
Using this method, experiments using broadcast manuscripts as text corpora have confirmed the effectiveness of this method.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password