A Proposal for an Evaluation Framework Using LLM-as-a-Judge and Continuous Learning

Shunta Ito; Sohei Kurita; Momoko Otake

[2Win5-102] A Proposal for an Evaluation Framework Using LLM-as-a-Judge and Continuous Learning

〇Shunta Ito¹, Sohei Kurita¹, Momoko Otake¹ (1.Microsoft Japan Co., Ltd.)

Keywords:Generative AI, Evaluation, Human-in-the-loop, Small Language Model, Eval-centric AI

LLM-as-a-Judge, a method of delegating evaluation tasks to large language models (LLMs), has gained significant attention. However, practical implementation faces multiple challenges, including ensuring evaluation quality, reducing expert workload, and addressing drifts. In this study, we propose a framework to solve these issues. Experimental results demonstrate that (1) larger models produce evaluations more closely aligned with human evaluations, (2) few-shot methods are not effective whereas fine-tuning is more suitable, and (3) increasing the amount of training data improves evaluation accuracy.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2Win5] Poster session 2

[2Win5-102] A Proposal for an Evaluation Framework Using LLM-as-a-Judge and Continuous Learning

Password