A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions

Sho Yokoi

3:20 PM - 3:40 PM

[1J2-01] A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions

〇Sho Yokoi^1,2, Sosuke Kobayashi³, Kenji Fukumizu⁴, Kentaro Inui^1,2 (1. Tohoku Univ., 2. Riken AIP, 3. Preferred Networks, 4. ISM)

Keywords:NLP, Machine Learning

Estimating pointwise mutual information (PMI), a well-known co-occurrence measure between linguistic expressions,
leads to a trade-off between learning time and the robustness to data sparsity. We propose a new kernel-based co-occurrence measure, named pointwise HSIC (PHSIC). PHSIC, intuitively, is a ``smoothed PMI'' by kernels, so it is robust to data sparsity; furthermore, its estimator is reduced to an efficient linear-time matrix calculation. In our experiments, we apply PHSIC to a dialogue response selection task using sparse language data. Experimental results show that the learning speed is about $100$ times faster than that of a recurrent neural network-based PMI estimator; moreover, when the size of the data is small, its predictive performance hardly deteriorates compared to PMI.

Presentation information

[1J2] [General Session] 9. NLP / IR

[1J2-01] A Linear-Time Kernelized Co-occurrence Norm for Sparse Linguistic Expressions