Scandalous Article Classification with Contrastive Learning BERT and Study of Sentence Embedded Representation

Yuichiro Takasu

9:20 AM - 9:40 AM

[3M1-GS-10-02] Scandalous Article Classification with Contrastive Learning BERT and Study of Sentence Embedded Representation

〇Yuichiro Takasu¹, Seiichi Ozawa¹, Takehide HIROSE², yoshihiro ikeda², noriyasu nakagawa², Masaaki Iizuka², Daisuke Nishida² (1. Kobe University, 2. Sumitomo Mitsui DS Asset Management Company, Limited)

Keywords:Deep Learning, Document Analysis, Scandals Article Classification

This research reports on an attempt to determine whether an economic article deals with a scandal or not, attributed to a binary classification problem. Since scandals can have a tremendous impact on the management of a company or entity, it is absolutely crucial to detect reported articles as early as possible, and overlooking them is absolutely unacceptable. This requires a high recall rate. In this study, we attempted to improve the recall rate by using a deep learning model called SimCSE, which is anisotropic in the sentence space of BERT, to suppress the oversight of scandalous articles. The results of experiments using Reuters articles showed that BERT with SimCSE applied improved the recall rate compared to BERT without SimCSE. Improvement was also observed in the index of sentence space uniformity, suggesting that this isotropic space contributed to the improvement in recall. The high level of uniformity was also found to be inherited before and after fine tuning.

Translated with www.DeepL.com/Translator (free version)

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3M1-GS-10] AI application

[3M1-GS-10-02] Scandalous Article Classification with Contrastive Learning BERT and Study of Sentence Embedded Representation

Password