[4Xin2-114] Annotation of Sequence Labeling with Pre-labeling by Large Language Models
Keywords:Sequence Labeling, Named Entity Recognition, Annotation, Dataset, Large Language Model
Generating training data incurs significant annotation costs in sequence labeling tasks, such as named entity recognition (NER). One approach to tackle this efficiency issue is to use labeling results from pretraining sequence labeling models as references during annotation. However, most publicly accessible sequence labeling models utilize general label sets, including labels like PERSON, LOCATION, and ORGANIZATION, and these models cannot pre-label domain-specific entities. This study proposes a method to improve the efficiency of generating training data for domain-specific sequence labeling tasks using pre-labeled results obtained from large language models as references during annotation. First, we employ a prompt surrounding the target sentence with domain-specific entities in XML format tags, then apply it to a large language model (LLM). Subsequently, we format the output from the LLM and display them as label candidates on the annotation tool. This approach allows us to provide pre-labeled results even in domains with limited data. We report on the impact of this method on the cost of annotation using this system.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.