JSAI2024

Presentation information

Poster Session

Poster session » Poster session

[4Xin2] Poster session 2

Fri. May 31, 2024 12:00 PM - 1:40 PM Room X (Event hall 1)

[4Xin2-114] Annotation of Sequence Labeling with Pre-labeling by Large Language Models

〇Kanato Ishii1, Takuro Niitsuma1, Yuya Taguchi1, Yosuke Yamano1, Kaori Sugino1, Hideaki Tamori1 (1.The Asahi Shimbun Company)

Keywords:Sequence Labeling, Named Entity Recognition, Annotation, Dataset, Large Language Model

Generating training data incurs significant annotation costs in sequence labeling tasks, such as named entity recognition (NER). One approach to tackle this efficiency issue is to use labeling results from pretraining sequence labeling models as references during annotation. However, most publicly accessible sequence labeling models utilize general label sets, including labels like PERSON, LOCATION, and ORGANIZATION, and these models cannot pre-label domain-specific entities. This study proposes a method to improve the efficiency of generating training data for domain-specific sequence labeling tasks using pre-labeled results obtained from large language models as references during annotation. First, we employ a prompt surrounding the target sentence with domain-specific entities in XML format tags, then apply it to a large language model (LLM). Subsequently, we format the output from the LLM and display them as label candidates on the annotation tool. This approach allows us to provide pre-labeled results even in domains with limited data. We report on the impact of this method on the cost of annotation using this system.

Please log in with your participant account.
» Participant Log In