JSAI2022

Presentation information

Interactive Session

General Session » Interactive Session

[3Yin2] Interactive session 1

Thu. Jun 16, 2022 11:30 AM - 1:10 PM Room Y (Event Hall)

[3Yin2-17] Unsupervised Text Segmentation by Invariant Information Clustering

〇Kento Kawasaki1 (1.Leading Edge Co.,Ltd.)

Keywords:Text Segmentation, Unsupervised Learning, Deep Learning, Invariant Information Clustering, Transformer

The text segmentation is a technology to divide texts according to topics. It is an important one to support natural language processing tasks such as document retrieval, summarization, and extraction, which is expected to be used for unstructured data. Unsupervised methods had been studied in the early days, most of which were heuristics, therefore, challenges were recognized in the text segmentation based on domain-specific knowledge and the text segmentation of various granularities. In recent years, deep learning-based supervised methods have been proposed to achieve highly accurate segmentation by using context-aware features, but their application is limited due to the high annotation cost. In this study, we propose an unsupervised method based on deep learning. Specifically, we introduce "Invariant Information Clustering" which is reportedly successful in the image field, to the Transformer-based network. We created a method of clustering approach that enables us to realize the text segmentation of various granularities. We show the lower error rate compared with the conventional unsupervised methods in the text segmentation of email documents containing job information.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password