JSAI2020

Presentation information

General Session

General Session » J-9 Natural language processing, information retrieval

[3Q5-GS-9] Natural language processing, information retrieval: Semantic similarity

Thu. Jun 11, 2020 3:40 PM - 5:00 PM Room Q (jsai2020online-17)

座長:秋元康佑(NEC)

4:20 PM - 4:40 PM

[3Q5-GS-9-03] Optimal Transport Cost between Texts via Norm-Direction Decomposition

〇Sho Yokoi1,2, Ryo Takahashi1,2, Reina Akama1,2, Jun Suzuki1,2, Kentaro Inui1,2 (1. Tohoku Univ., 2. RIKEN)

Keywords:Natural Language Processing, Optimal Transport

One key principle for assessing semantic textual similarity is to measure the degree of semantic overlap of two texts by considering word-by-word alignment; however, such methods are empirically inferior to methods based on generic sentence encoders. We hypothesize that the reason for the inferiority of alignment-based methods is due to the fact that they do not distinguish word importance and word meaning. To solve this, we propose to separate word importance and word meaning by decomposing word vectors into their norm and direction, then compute word-by-word alignment based similarity using optimal transport. We call the method word rotator's distance (WRD) because direction vectors are aligned by rotation on the unit hypersphere. In addition, to incorporate the advance of cutting edge additive sentence encoders, we propose to re-decompose such sentence vectors into word vectors and use them as inputs to WRD. Empirically, WRD outperforms current methods considering the word-by-word alignment including word mover's distance with a big difference; moreover, our method outperforms state-of-the-art additive sentence encoders on the most competitive dataset, STS-benchmark.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password