[4Rin1-77] Analogy Task with Log Co-occurrence: Comparison to word2vec and PMI
Keywords:co-occurrence matrix, word embedding model, analogy
Word embedding models show the high performance on, analogy task among other semantic tasks. The consensus on the high performance is that the inner product of word vectors created using these models approximates a co-occurrence frequency weighted by a pointwise mutual information (PMI). Thus indicating that PMI matrix has the important information for analogy task. However, this explanation is insufficient concerning high performance on analogy tasks, PMI itself is not related to analogy. To further investigate the role of co-occurrence matrix for analogy task, we conduct experiments using co-occurrence matrix weighted by log, which represents the original co-occurrence more closely. The result shows that log co-occurrence (logfreq) can be used to solve analogy task comparably to PMI matrix, and SVD applied logfreq outperforms others. The result indicates that PMI is not necessary for analogy tasks and it is important to further investigate the characteristics of the original co-occurrence matrix.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.