Presentation information

Organized Session

Organized Session » OS-22

[3D1-OS-22a] OS-22 (1)

Thu. Jun 11, 2020 9:00 AM - 10:40 AM Room D (jsai2020online-4)

上野 未貴(大阪工業大学)、森 直樹(大阪府立大学)、はたなか たいち(株式会社クリエイターズインパック)

9:20 AM - 9:40 AM

[3D1-OS-22a-02] Paragraph Segmentation for Novels using BERT with Focal Loss

〇Riku Iikura1, Makoto Okada2, Naoki Mori2 (1. Osaka Prefecture University, 2. Graduate School of Engineering, Osaka Prefecture University)

Keywords:Natural Language Processing, Text Segmentation, Imbalanced Classification, BERT, Focal Loss

We worked on the problem of paragraph segmentation from the perspective of understanding the content of novels. Estimating the paragraph of a text can be considered as a binary classification problem regarding whether the two sentences concerned belong to the same paragraph. In that case, the number of paragraphs is small relative to the number of sentences. Therefore it is necessary to consider the imbalance in the number of data. We applied the Bidirectional Encoder Representations from Transformer (BERT), which has shown high accuracy in various natural language processing tasks, to the paragraph segmentation problem. We improved the performance of the model by using focal loss as the loss function of the classifier. As a result, the effectiveness of the proposed model was confirmed in datasets made for this work. In addition, the value of each evaluation metrics was improved by expanding the range of input sentences for the model.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.