[4Xin2-48] Study of coherence evaluation for English texts
Keywords:AI, NLP, Automatic Scoring, Educational application, coherence
In this paper, we defined the task of coherence, which indicates the naturalness of logical development, and created a rubric to evaluate the quality of the text. Using the rubric, we created a dataset of essays written by English language learners that were manually evaluated by experts. The Fleiss' Kappa of the three experts' manual ratings was 0.17. We also conducted an automatic coherence evaluation using a specialized model and LLM. In the automatic evaluation, the method directly evaluated by GPT-4 was the closest to the manual coherence evaluation, with a Pearson's correlation coefficient of 0.381.The original method using Sentence Ordering outperformed the conventional MultiNLI model by using a specific score index.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.