[4Xin2-109] Character Setting Extraction for Enhanced Character-LLM Evaluation
Keywords:Character-LLM, Evaluation Metric, Virtual Human
Since Li et al.'s study, research on Character-Large Language Models (LLMs) engaging in character role-playing, termed Character-LLM, has progressed. Li et al. explored the reproducibility of 32 characters using two approaches: Retrieval Augmented Generation (RAG) and fine-tuning. Wang et al. proposed a method to quantitatively evaluate the personality traits of characters role-played by LLMs using psychological metrics such as the Big Five and MBTI. Shao et al. evaluated role-playing proficiency along five axes using ChatGPT. However, Wang's method only assesses personality trait similarity, while Shao's lacks clarity and may lead to black-box evaluation results. Japanese role-playing demands precise reproduction of various linguistic elements. This study proposes a method to automatically evaluate chatbots role-playing characters by extracting character settings from past utterances and conducting automatic evaluations. The experiment extracted 54 character settings from imma's tweet data, achieving a macro-averaged precision points in automatic evaluations.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.