JSAI2021

Presentation information

General Session

General Session » GS-5 Language media processing

[4J3-GS-6f] 言語メディア処理:データセットとその利用

Fri. Jun 11, 2021 1:40 PM - 3:20 PM Room J (GS room 5)

座長:亀甲 博貴(京都大学)

2:40 PM - 3:00 PM

[4J3-GS-6f-04] Evaluation of Sentence Similarity Computation for Building a Japanese Paraphrase Corpus for Children

〇Tomoki Nishiyama1, Kazuaki Ando1 (1. Kagawa University)

Keywords:Japanese Corpus, Paraphrase, Sentence Similarity

In recent years, elementary schools are practicing Newspaper in Education (NIE) which is education using newspaper as teaching materials. However, it is difficult for elementary school students to understand the contents of newspaper articles because the articles are written for the general readers. If the text of the articles can be automatically paraphrased to simple text, we can say that this problem can be improved. Currently, there are not many Japanese paraphrase corpora or data sets for children. In this study, we focus on news articles on NHK NEWS WEB EASY (NNWE) which is a website for children. This site has simple and easy to understand news articles by paraphrasing selected general news articles manually. The purpose of this study is to construct a Japanese paraphrase corpus by mapping each sentence of the related news articles on NNWE and NHK News Web (NNW) as easy and difficult sentences, respectively. The purpose of this study is to construct a Japanese paraphrase corpus by mapping each sentence of the related news articles on NNWE and the NNW as easy and difficult sentences, respectively. As the result of experiments, we confirmed that the effectiveness of the method for computing the similarity of sentences based on alignment between word embeddings on the Japanese data set, and the accuracy of the method was improved by specifying the part of speech used for the computation.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password