JSAI2019

Presentation information

Interactive Session

[4Rin1] Interactive Session 2

Fri. Jun 7, 2019 9:00 AM - 10:40 AM Room R (Center area of 1F Exhibition hall)

9:00 AM - 10:40 AM

[4Rin1-13] Constructing of the word embedding model by Japanese large scale SNS + Web corpus

〇Shogo Matsuno1, Sakae Mizuki1, Takeshi Sakaki1 (1. Hottolink, Inc.)

Keywords:word embedding, language resource, corpus, SNS

In this paper, we present the word embedding model constructed by Japanese text existing on SNS including Twitter. This model is created from a Japanese large-scale corpus using multiple categories such as SNS data, Wikipedia, and Web pages as media. Perorming the evaluation by the word similarity calculation task with Speaman's rank correlation coefficient as the evaluation index for the created word embedding model resulted in a performance of about 7 points better than the model created by only Wikipedia as the learning corpus was obtained. The presented word embedding model in this paper is planned to be released through the website, and we hope that by utilizing this model, natural language processing research for SNS data will become more active.