6:10 PM - 6:30 PM
[2G6-GS-6-03] ChatGPT-based adaptive data augmentation for multi-label Japanese text classification in the medical domain
Keywords:Natural language processing, Data augmentation, ChatGPT
Multi-label text classification is a common task type in the medical domain. However, the preparation of the training dataset (annotation) is costly because manual annotations are laborious and require extensive domain-specific knowledge. Here we introduce an automated data augmentation method using ChatGPT, in which new training data are generated according to the ground-truth data (NTCIR-13 MedWeb Japanese corpus). The method is adaptive because it leverages a baseline BERT model fine-tuned with the ground-truth dataset for active filtering of generated training data. The final model trained with the dataset in which the ground truth and augmented data were merged showed a 2.4% improvement in the F1 score compared with the baseline model. The proposed algorithms can help solve multi-label classification problems in the medical domain.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.