JSAI2025

Presentation information

Poster Session

Poster session » Poster Session

[2Win5] Poster session 2

Wed. May 28, 2025 3:30 PM - 5:30 PM Room W (Event hall D-E)

[2Win5-102] A Proposal for an Evaluation Framework Using LLM-as-a-Judge and Continuous Learning

〇Shunta Ito1, Sohei Kurita1, Momoko Otake1 (1.Microsoft Japan Co., Ltd.)

Keywords:Generative AI, Evaluation, Human-in-the-loop, Small Language Model, Eval-centric AI

LLM-as-a-Judge, a method of delegating evaluation tasks to large language models (LLMs), has gained significant attention. However, practical implementation faces multiple challenges, including ensuring evaluation quality, reducing expert workload, and addressing drifts. In this study, we propose a framework to solve these issues. Experimental results demonstrate that (1) larger models produce evaluations more closely aligned with human evaluations, (2) few-shot methods are not effective whereas fine-tuning is more suitable, and (3) increasing the amount of training data improves evaluation accuracy.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password