[3Win5-41] Construction and Performance Analysis of a Japanese Sequential Sentence Classification Dataset for Medical Papers
Keywords:Sequential sentence classification, Sentence classification, Natural language processing, Scholarly document processing
Sequential sentence classification (SSC) of research paper abstracts has attracted attention as a fundamental technology for information retrieval or extractive summarization. However, previous studies have only utilized English abstracts in constructing training datasets, making it difficult to apply SSC to Japanese research paper abstracts. Therefore, we created a new SSC dataset comprising abstracts from Japanese medical research papers. We trained a hierarchical bidirectional LSTM-based architecture using this dataset. Furthermore, we proposed methods to utilize existing English datasets, including data augmentation using large language models and directly using both English and Japanese data in training. Additionally, we introduced a method to enhance recognition of expressions specific to research papers. As a result, we achieved approximately 92% accuracy and 88% macro-F1 score in SSC for Japanese research papers.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.