2:40 PM - 3:00 PM
[4F3-GS-3-03] Few-Shot Multi-Label Annotation of Causes for Incident Texts Using Large Language Models
Keywords:Large language Models, GPT-4, Textual Entailment, Few-shot Learning
In the corporate environment, daily accumulated reports of workplace accidents offer valuable insights for root cause analysis, formulation of preventive measures, and safety education. However, the unstructured nature of these textual data impedes efficient knowledge accumulation and reuse. This study aims to structure these documents, transforming them into a repository of reusable and valuable knowledge.
In this experiment, we conducted multi-label annotation based on text entailment on textual data related to workplace accident reports from an electric power company using the general-purpose LLM, GPT, in four different approaches. Additionally, the abstract categories of accident causes used for the annotation task were also extracted from GPT-4 in a zero-shot manner, which were then checked by human experts to determine the category labels for the task.
The results of the experiment showed that, particularly with a one-shot approach using prompt engineering, GPT demonstrated a strong generalization ability, showing promising performance close to that of human annotators in some evaluation metrics. However, it was also suggested that when dealing with cases that are highly specialized and involve multiple complex factors, as in this study, careful adjustment in model selection and prompt setting is required.
In this experiment, we conducted multi-label annotation based on text entailment on textual data related to workplace accident reports from an electric power company using the general-purpose LLM, GPT, in four different approaches. Additionally, the abstract categories of accident causes used for the annotation task were also extracted from GPT-4 in a zero-shot manner, which were then checked by human experts to determine the category labels for the task.
The results of the experiment showed that, particularly with a one-shot approach using prompt engineering, GPT demonstrated a strong generalization ability, showing promising performance close to that of human annotators in some evaluation metrics. However, it was also suggested that when dealing with cases that are highly specialized and involve multiple complex factors, as in this study, careful adjustment in model selection and prompt setting is required.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.