JSAI2024

Presentation information

General Session

General Session » GS-5 Language media processing

[2G5-GS-6] Language media processing:

Wed. May 29, 2024 3:30 PM - 5:10 PM Room G (Room 22+23)

座長:牧田光晴(LINEヤフー株式会社/SB Intuitions株式会社)

3:50 PM - 4:10 PM

[2G5-GS-6-02] Data Augmentation with ChatGPT for Efficient Evaluation of Large Language Models in Data-Scarce Environments

〇HANHUA ZHU1,2 (1. freee K.K., 2. Univ. of Tokyo)

Keywords:Large Language Models, Data Augmentation, Evaluation Model, ChatGPT

In recent years, the development of Large Language Models (LLMs) has rapidly progressed, playing a significant role in Natural Language Processing (NLP). However, there is currently no established standard for efficiently evaluating these LLMs, which often generate complex sentences. Existing evaluation methods using trained Language Models (LMs) are popular due to their cost-effectiveness, but they often fall short in accuracy when training data is scarce. I propose a data augmentation method using ChatGPT to improve the accuracy of LMs in situations of data scarcity. Results on the Japanese Question Answering (QA) task demonstrate that an LM, trained using questions and answers generated by the proposed method, surpassed ChatGPT3.5 and achieved 92% of the evaluation performance of ChatGPT4, even in scenarios where only documents were available.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password