10:20 AM - 10:40 AM
[2N1-GS-4-05] Classification of author affiliations extracted from scholarly PDF documents
Keywords:Institutional Classification, Bibliometric Analysis, Author Analysis
The affiliation information of authors in academic papers plays a crucial role in various analyses in scientometrics. To obtain author affiliation information from academic papers, many previous studies have relied on publisher databases or open databases as sources of information. However, these databases do not necessarily store the author affiliation information of the analysis target as metadata. This can result in a decrease in analysis coverage. Extracting affiliation information from raw PDF files could be a solution to solve this problem. In this study, we propose a method to extract strings directly related to the affiliation information of authors from academic paper PDFs and classify whether the research institution belongs to academia or industry. Our results demonstrate a successful classification rate of approximately 90% for research institutions. In practical applications, our proposed method reduced manual classification by about 63%.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.