JSAI2024

Presentation information

General Session

General Session » GS-4 Web intelligence

[4G1-GS-4] Web intelligence:

Fri. May 31, 2024 9:00 AM - 10:40 AM Room G (Room 22+23)

座長:榊 剛史(株式会社ホットリンク)

9:20 AM - 9:40 AM

[4G1-GS-4-02] Extracting Named Entities from Press Releases Using Anomaly Detection Techniques

〇Reiji Kifune1, Ayahiko Niimi2 (1. Graduate School of Future University Hakodate, 2. Future University Hakodate)

[[Online]]

Keywords:Web Mining, Text Mining, Anomaly Detection, Press Release

This paper introduces a novel approach for extracting named entities as outliers from press-release texts using anomaly detection techniques and validated its effectiveness and potential applicability in company research. This study used the local outlier factor (LOF), a data density-based anomaly detection technique known for its robust performance even in high-dimensional spaces. Specifically, this approach initially uses pretrained FastText on the entire press-release texts to convert nouns into vectors, leveraging FastText’s adaptability to unknown words. Subsequently, these vectors are fed into LOF to detect outliers. Results showed that the proposed method successfully extracted eight types of named entities, as defined by IREX, as outliers in the experiments. However, among the identified outliers, several words deviated from the defined criteria of named entities and noise was present in the output.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password