Development of a Comprehensive Evaluation Leaderboard for Japanese Language LLMs

Yuya Yamamoto

10:00 AM - 10:20 AM

[2G1-GS-11-04] Development of a Comprehensive Evaluation Leaderboard for Japanese Language LLMs

〇Yuya Yamamoto¹, Keisuke Kamata¹, Akira Shibata¹ (1. Weights & Biases Japan)

Keywords:Machine Learning, LLM, Model Evaluation, Leaderboard

Nejumi LLM Leaderboard Neo, aims to provide a comprehensive evaluation of Japanese large language models (LLMs) from multiple perspectives. This leaderboard assesses models based on their language understanding and generation capabilities. This evaluation combines benchmark tests in a question-and-answer format with Japanese language generation tasks to evaluate models' comprehension and text generation abilities. Insights gained from the operation of the leaderboard highlight the importance of model comparison and the need for transparent and uniform evaluation criteria. Differences in conversational abilities and response to structured questions among various models were observed, revealing a correlation between language understanding and generative abilities in conversation. However, it has been noted that a trade-off emerges among models of comparable parameter sizes. Nejumi LLM Leaderboard Neo offers a novel approach to evaluating Japanese LLMs, contributing to the further evolution and improvement of Japanese language models.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2G1-GS-11] AI and Society:

[2G1-GS-11-04] Development of a Comprehensive Evaluation Leaderboard for Japanese Language LLMs

Password