JSAI2018

Presentation information

Oral presentation

General Session » [General Session] 9. NLP / IR

[1J2] [General Session] 9. NLP / IR

Tue. Jun 5, 2018 3:20 PM - 5:00 PM Room J (2F Royal Garden B)

座長:木村 泰知(小樽商科大学)

3:20 PM - 3:40 PM

[1J2-01] A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions

〇Sho Yokoi1,2, Sosuke Kobayashi3, Kenji Fukumizu4, Kentaro Inui1,2 (1. Tohoku Univ., 2. Riken AIP, 3. Preferred Networks, 4. ISM)

Keywords:NLP, Machine Learning

Estimating pointwise mutual information (PMI), a well-known co-occurrence measure between linguistic expressions,
leads to a trade-off between learning time and the robustness to data sparsity. We propose a new kernel-based co-occurrence measure, named pointwise HSIC (PHSIC). PHSIC, intuitively, is a ``smoothed PMI'' by kernels, so it is robust to data sparsity; furthermore, its estimator is reduced to an efficient linear-time matrix calculation. In our experiments, we apply PHSIC to a dialogue response selection task using sparse language data. Experimental results show that the learning speed is about $100$ times faster than that of a recurrent neural network-based PMI estimator; moreover, when the size of the data is small, its predictive performance hardly deteriorates compared to PMI.