
Presentation information

General Session

General Session » GS-5 Language media processing

[2E6-GS-6] Language media processing

Wed. Jun 7, 2023 5:30 PM - 7:10 PM Room E (A2)

座長:中山 英樹(東京大学) [現地]

6:50 PM - 7:10 PM

[2E6-GS-6-05] Toward the construction of linguistically-valid CCG treebank

〇Asa Tomita1, Hitomi Yanaka2, Daisuke Bekki1 (1. Ochanomizu University, 2. The University of Tokyo)

Keywords:Treebank, CCG, Syntactic parsing, Theoretical linguistics

Constructing linguistically valid CCG treebanks is necessary since CCG parsing often uses CCG treebanks as training and evaluation data. However, it is known that the current Japanese CCG treebank, CCGbank, incorrectly analyzes Japanese syntactic structures, including passive and causative constructions. The ABCTreebank, a treebank for ABC grammar, has made many improvements, such as argument structures. However, it does not describe the detailed syntactic features of Japanese CCG. Meanwhile, the output of the Japanese CCG parser, lightblue, successfully provides the syntactic structures with detailed syntactic features but faces the challenge of capturing the argument structures correctly. In this study, we propose a method to generate a Japanese treebank with more linguistically valid and detailed information by combining the advantages of the ABCTreebank with lightblue. We develop an algorithm to filter lightblue's lexical items using ABCTreebank and construct a linguistically valid CCG treebank by transforming the output of lightblue.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.
