3:40 PM - 4:00 PM
[2L4-J-9-02] Text simplification using newspaper articles
Keywords:text simplification, parallel corpus, alignment
Automatic text simplification attempts to automatically transform complex sentences into their simpler variants without significantly changing the original meaning. Several researches on automatic text simplification have conducted based on a large-scale monolingual parallel corpus. However, it is costly to manually construct a parallel corpus for text simplification. Therefore, we investigate automatic construction of a large-scale simplified corpus for Japanese from newspaper database corpora. In this paper, we examined several methods for sentence alignment of texts with different complexity levels. Using the best of them, we sentence-align the Mainichi newspaper and Mainichi newspaper for elementary students, thus providing large training materials for automatic text simplification systems.