JSAI2024

Presentation information

General Session

General Session » GS-5 Language media processing

[2G4-GS-6] Language media processing:

Wed. May 29, 2024 1:30 PM - 3:10 PM Room G (Room 22+23)

座長:河野 誠也(理化学研究所)

1:30 PM - 1:50 PM

[2G4-GS-6-01] Automated Evaluation of Question Answering by Using Large Language Models and Its Defense Against Prompt Injections

〇Takumi Kondo1, Koh Takeuchi1, Jiyi Li2, Shigeru Saito3, Hisashi Kashima1 (1. Kyoto University, 2. University of Yamanashi, 3. SIGNATE Inc.)

Keywords:Question answering, Semantic understanding, Automated evaluation, Large language models

In many question-answering tasks of Natural Language Processing, its evaluation is based on exact or partial matching between the candidate text and pre-prepared reference answers, regardless of the question domain. However, evaluating open-domain question-answering, which does not limit the range of topics, becomes problematic due to issues such as synonymous expressions and variations in notation, making accurate assessment challenging through lexical matching. Existing studies have proposed automated evaluation using Large Language Models (LLM) to address these challenges, but discussions on the vulnerability of automated evaluation are lacking. In this study, we propose a new framework for automated evaluation using LLM to address these issues and discuss its performance and robustness. Our experiments show that LLM's automated evaluation aligns with human evaluation in over 90% of cases and demonstrates resilience against attacks on the evaluation system.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password