JSAI2019

Presentation information

Interactive Session

[3Rin2] Interactive Session 1

Thu. Jun 6, 2019 10:30 AM - 12:10 PM Room R (Center area of 1F Exhibition hall)

10:30 AM - 12:10 PM

[3Rin2-16] Cluster analysis of Twitter Data, using Interactive Data visualization Tool

〇Shinichiro Wada1 (1. Graduate School of Sociology, Rikkyo University.)

Keywords:word2vec, Twitter Data Analysis, Embedding Projector, t-SNE Algorithm, Mecab-ipadic-NEologd

This study attempts cluster analysis of Twitter data posted on Tokyo Governor's Election held in 2016, using Python (July 13 - August 1, 2016, 4.8 million tweets, 170 million words) . For cluster analysis, words were vectorized using gensim version word2vec algorithm which is a library of Python, and attempt to visualize clusters in three dimensions using t-SNE (t-distributed Stochastic Neighbor Embedding) which is dimensionality reduction algorithm. In particular, in this research, we used the data visualization tool Embedding Projector for clustering. By using this tool, we attempted to visually identify clusters by moving the three-dimensional space interactively while visualizing the dynamic learning process in the three-dimensional space. As a result, we could identify multiple clusters with high accuracy. This made it possible to clarify what in this election Twitter users were interested in.