JSAI2021

Presentation information

General Session

General Session » GS-5 Language media processing

[4J3-GS-6f] 言語メディア処理:データセットとその利用

Fri. Jun 11, 2021 1:40 PM - 3:20 PM Room J (GS room 5)

座長:亀甲 博貴(京都大学)

2:00 PM - 2:20 PM

[4J3-GS-6f-02] JSICK: Japanese Sentences Involving Compositional Knowledge Dataset

〇Hitomi Yanaka1, Koji Mineshima2 (1. Riken, 2. Keio University)

Keywords:Natural Language Inference, Recognizing Textual Entailment, Semantic Similarity, Dataset, Crowdsourcing

This paper introduces JSICK, a Japanese dataset for Recognizing Textual Entailment (RTE) and Semantic Textual Similarity (STS), manually translated from the English dataset SICK that focuses on compositional aspects of natural language inferences. Each sentence in JSICK is annotated with semantic tags to analyze whether models can capture diverse semantic phenomena. We perform a baseline evaluation of BERT-based RTE and STS models on JSICK, as well as a stress test in terms of word order scrambling in the JSICK test set. The results suggest that there is room for improving the performance on complex inferences and the generalization capacity of the models.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password