JSAI2021

Presentation information

General Session

General Session » GS-5 Language media processing

[4J3-GS-6f] 言語メディア処理:データセットとその利用

Fri. Jun 11, 2021 1:40 PM - 3:20 PM Room J (GS room 5)

座長:亀甲 博貴(京都大学)

1:40 PM - 2:00 PM

[4J3-GS-6f-01] Construction of a Semantic Textual Similarity Dataset Requiring the Understanding of Multi-Word Expressions

〇Takashi Kambe1, Sho Yokoi1,2, Masashi Yoshikawa1,2, Kentaro Inui1,2 (1. Tohoku Univ., 2. RIKEN)

Keywords:Paraphrase Identification, Multi-Word Expression, Semantic Textual Similarity

The broad range of applications in natural language processing and text mining requires the computation of sentence similarities, such as similarity-based text retrieval, automatic evaluation of generated texts. However, these studies have largely ignored multi-word expressions (MWEs), an important component of natural language. MWEs are phrases for which the meaning of the whole phrase cannot be naturally inferred from the meaning of constituent words, such as “hot dog.” Needless to say, when computing the meaning of the whole sentence, accurate processing of the meaning of MWEs is as important as that of each word. To introduce the perspective of MWEs into the study of textual similarity, we attempt to create a new textual similarity dataset requiring semantic computation of MWEs. Specifically, we exploited (1) a combination of back-translation and constrained decoding, and (2) mask prediction by BERT. We showed that our proposed can make balanced sentence similarity evaluation data.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password