JSAI2019

Presentation information

General Session

General Session » [GS] J-9 Natural language processing, information retrieval

[2L4-J-9] Natural language processing, information retrieval: conversion and generation

Wed. Jun 5, 2019 3:20 PM - 4:40 PM Room L (203+204 Small meeting rooms)

Chair:Ichiro Kobayashi Reviewer:Yuzuru Okajima

3:40 PM - 4:00 PM

[2L4-J-9-02] Text simplification using newspaper articles

Naoki Koto1, 〇Hidetsugu Nanba1, Toshiyuki Takezawa1 (1. Hiroshima City University)

Keywords:text simplification, parallel corpus, alignment

Automatic text simplification attempts to automatically transform complex sentences into their simpler variants without significantly changing the original meaning. Several researches on automatic text simplification have conducted based on a large-scale monolingual parallel corpus. However, it is costly to manually construct a parallel corpus for text simplification. Therefore, we investigate automatic construction of a large-scale simplified corpus for Japanese from newspaper database corpora. In this paper, we examined several methods for sentence alignment of texts with different complexity levels. Using the best of them, we sentence-align the Mainichi newspaper and Mainichi newspaper for elementary students, thus providing large training materials for automatic text simplification systems.