Keywords:Natural Language Generation, Manual Evaluation, Annotation Assistant Tools
When evaluating automatically generated ad creative-texts of NLG systems, we often put more importance on manual evaluation by human evaluators than automatic evaluation metrics such as ROUGE. Despite this, there is a lack of evaluation metrics dedicated to advertisement domain and assistant tools regarding the best practices. In this paper, we review the metrics for manual evaluation for NLG systems. We also give an outlook for the assistant tools for evaluation focused on automatically generated ads with domain-specific evaluation metrics, as well as the measurement of evaluators' agreement and performance.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.