10:20 AM - 10:40 AM
[2L1-OS-25-05] Challenges and Useful Tools for Efficiently Creating a Corpus on Rare and Intractable Diseases Using LLMs
Keywords:Rare and intractable diseases, LLM, Corpus, Ontology, Annotation
There are approximately 10,000 rare and intractable diseases. Because each has few cases, healthcare professionals have limited opportunities to gain experience, and it reportedly takes seven to eight years on average to reach a diagnosis. To address this, artificial intelligence is being explored, and developing high-quality case corpora is urgently needed. We are creating a corpus of case texts tagged with disease and symptom names, boosting efficiency by combining a large language model (LLM) with a web-based annotation management and editing tool. When annotating via LLM, we implemented three strategies: (1) data normalization to reduce token counts, (2) chunking the input into shorter segments to avoid processing interruptions, and (3) outputting the data in JSON format. Experts then use TexTAE (https://textae.pubannotation.org/) for GUI-based evaluation and revisions, followed by PubAnnotation (https://pubannotation.org/) to evaluate and resolve differences among annotators. In this presentation, we will share our human-in-the-loop approach to building a case corpus with LLMs and discuss the features needed for an efficient workflow.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.