1:30 PM - 1:50 PM
[2G4-GS-6-01] Automated Evaluation of Question Answering by Using Large Language Models and Its Defense Against Prompt Injections
Keywords:Question answering, Semantic understanding, Automated evaluation, Large language models
In many question-answering tasks of Natural Language Processing, its evaluation is based on exact or partial matching between the candidate text and pre-prepared reference answers, regardless of the question domain. However, evaluating open-domain question-answering, which does not limit the range of topics, becomes problematic due to issues such as synonymous expressions and variations in notation, making accurate assessment challenging through lexical matching. Existing studies have proposed automated evaluation using Large Language Models (LLM) to address these challenges, but discussions on the vulnerability of automated evaluation are lacking. In this study, we propose a new framework for automated evaluation using LLM to address these issues and discuss its performance and robustness. Our experiments show that LLM's automated evaluation aligns with human evaluation in over 90% of cases and demonstrates resilience against attacks on the evaluation system.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.