JSAI2025

Presentation information

Organized Session

Organized Session » OS-25

[2L1-OS-25] OS-25

Wed. May 28, 2025 9:00 AM - 10:40 AM Room L (Room 1007)

オーガナイザ:矢田 竣太郎(筑波大学),荒牧 英治(奈良先端科学技術大学院大学),河添 悦昌(東京大学),堀 里子(慶應義塾大学),木﨑 速人(慶應義塾大学)

10:20 AM - 10:40 AM

[2L1-OS-25-05] Challenges and Useful Tools for Efficiently Creating a Corpus on Rare and Intractable Diseases Using LLMs

〇Eisuke Dohi1, Jin-Dong Kim2, Itaru Hayakawa3, Tomoyasu Matsubara4, Terue Takatsuki2, Yuka Tateishi5, Toyofumi Fujiwara2, Yasunori Yamamoto2 (1. National Center of Neurology and Psychiatry, 2. Research Organization of Information and Systems, Database center for Life Science, 3. National Center for Child Health and Development, Department of Neurology, 4. Tokushima University, Department of Neurology , 5. Japan Science and Technology Agency, Department of NBDC Program)

Keywords:Rare and intractable diseases, LLM, Corpus, Ontology, Annotation

There are approximately 10,000 rare and intractable diseases. Because each has few cases, healthcare professionals have limited opportunities to gain experience, and it reportedly takes seven to eight years on average to reach a diagnosis. To address this, artificial intelligence is being explored, and developing high-quality case corpora is urgently needed. We are creating a corpus of case texts tagged with disease and symptom names, boosting efficiency by combining a large language model (LLM) with a web-based annotation management and editing tool. When annotating via LLM, we implemented three strategies: (1) data normalization to reduce token counts, (2) chunking the input into shorter segments to avoid processing interruptions, and (3) outputting the data in JSON format. Experts then use TexTAE (https://textae.pubannotation.org/) for GUI-based evaluation and revisions, followed by PubAnnotation (https://pubannotation.org/) to evaluate and resolve differences among annotators. In this presentation, we will share our human-in-the-loop approach to building a case corpus with LLMs and discuss the features needed for an efficient workflow.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password