JSAI2024

Presentation information

Poster Session

Poster session » Poster session

[4Xin2] Poster session 2

Fri. May 31, 2024 12:00 PM - 1:40 PM Room X (Event hall 1)

[4Xin2-53] Verification of Using LLM for Automating Dialogue Data Evaluation

〇Yuki Kubo1, Tomoya Yamashita1, Masanori Yamada1 (1.NTT Social Informatics Laboratories)

Keywords:Dialogue System, Evaluation, Large Language Model

There are many methods for building dialogue systems, but research on evaluating dialogues remains challenging. Metrics like the quality of dialogue, which are difficult to quantify, are often evaluated by human judgement. Recently, methods using LLMs for evaluating dialogue data have been proposed. LLMs evaluate relatively similarly to human, but the evaluation is not similar sufficiently. The Elo rating system, which evaluates data by comparing two data, is assumed that it does not need to consider the difference of standards by evaluators. So, Elo rating system is expected to increase accuracy. In some cases, Elo rating system may not increase accuracy, like the distribution of evaluation values is biased. In this study, we examine whether the Elo rating system increases accuracy of evaluation in various distributions of evaluation values.

Please log in with your participant account.
» Participant Log In