JSAI2024

Presentation information

Organized Session

Organized Session » OS-7

[3S1-OS-7b] OS-7

Thu. May 30, 2024 9:00 AM - 10:40 AM Room S (Room 52)

オーガナイザ:矢田 竣太郎(奈良先端科学技術大学院大学)、荒牧 英治(奈良先端科学技術大学院大学)、河添 悦昌(東京大学)、堀 里子(慶應義塾大学)

10:00 AM - 10:20 AM

[3S1-OS-7b-04] Training Dataset for Japanese Simplification in Medical Domain

〇Koki Horiguchi1, Tomoyuki Kajiwara1, Takashi Ninomiya1, Shoko Wakamiya2, Eiji Aramaki2 (1. Ehime University, 2. Nara Institute of Science and Technology)

Keywords:Medical NLP, Text Simplification, Parallel Corpus Mining

We release a large-scale parallel corpus for medical text simplification in Japanese. This corpus can be used to train a text simplification model that paraphrases medical terms into expressions that patients can understand without effort. To address the low-resource problem for this task in Japanese, we automatically extracted 17,300 sentence pairs that were semantically equivalent from both professional and consumer versions of articles in online medical dictionaries. We compared several sentence embedding models for Japanese and extracted simplified sentence pairs from article pairs by embedding-based bipartite graph matching. Experimental results on Japanese text simplification tasks in four domains revealed that models trained on our medical text simplification corpus achieved high performance in medical domains.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password