JSAI2018

Presentation information

Poster presentation

General Session » Interactive

[4Pin1] インタラクティブ(2)

Fri. Jun 8, 2018 9:00 AM - 10:40 AM Room P (4F Emerald Lobby)

9:00 AM - 10:40 AM

[4Pin1-48] Construction of dataset for summarization based on the Twitter URL Paraphrase Corpus

〇Koichi Nagatsuka1 (1. Soka University)

Keywords:Summarization, Paraphrase, Dataset, Twitter, Natural Language Processing

The purpose of text summarization is to produce a condensed version of an input text which has a core meaning of the original. Most of the summarization systems are built on dataset using news articles. While Social Networking Service(SNS) such as Twitter is increasingly becoming an important information resource, the lack of dataset for SNS is one of the challenges in extending the range of summarization systems. In this paper, we address the problem by transforming the Twitter URL Paraphrase Corpus into summarization dataset. In order to extract important paraphrases for summary and increase the number of high quality paraphrases, we created the paraphrase classifier and paraphrase generator using supervised learning based on the corpus. In experiments, we evaluate paraphrase classifier by quantitative evaluation and paraphrase generator by qualitative evaluation respectively.