[4Yin2-17] Gap between Semantic Textual Similarity Benchmark Task and Downstream Tasks
Keywords:Semantic Textual Similarity, Dataset, benchmark
The Semantic Textual Similarity (STS) task measures the ability to evaluate the similarity between two sentences, which is necessary for downstream tasks such as machine translation evaluation and related passage retrieval. Several NLP researchers discuss the performance of this ability on benchmark dataset. However, there is a possibility that a system that is highly evaluated on the benchmark dataset may not be able to demonstrate appropriate effectiveness in actual downstream tasks. In this study, we examined this gap between STS and downstream tasks, clarified what factors are important in evaluating the similarity between two sentences in the downstream tasks, and discussed a policy for improving the benchmark dataset.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.