JSAI2024

Presentation information

General Session

General Session » GS-4 Web intelligence

[2N1-GS-4] Web intelligence:

Wed. May 29, 2024 9:00 AM - 10:40 AM Room N (Room 54)

座長:林 克彦(東京大学)[[オンライン]]

10:20 AM - 10:40 AM

[2N1-GS-4-05] Classification of author affiliations extracted from scholarly PDF documents

〇Kazuhiro Yamauchi1, Marie Katsurai1 (1. Doshisha University)

Keywords:Institutional Classification, Bibliometric Analysis, Author Analysis

The affiliation information of authors in academic papers plays a crucial role in various analyses in scientometrics. To obtain author affiliation information from academic papers, many previous studies have relied on publisher databases or open databases as sources of information. However, these databases do not necessarily store the author affiliation information of the analysis target as metadata. This can result in a decrease in analysis coverage. Extracting affiliation information from raw PDF files could be a solution to solve this problem. In this study, we propose a method to extract strings directly related to the affiliation information of authors from academic paper PDFs and classify whether the research institution belongs to academia or industry. Our results demonstrate a successful classification rate of approximately 90% for research institutions. In practical applications, our proposed method reduced manual classification by about 63%.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password