10:30 AM - 12:10 PM
[3Rin2-16] Cluster analysis of Twitter Data, using Interactive Data visualization Tool
Keywords:word2vec, Twitter Data Analysis, Embedding Projector, t-SNE Algorithm, Mecab-ipadic-NEologd
This study attempts cluster analysis of Twitter data posted on Tokyo Governor's Election held in 2016, using Python (July 13 - August 1, 2016, 4.8 million tweets, 170 million words) . For cluster analysis, words were vectorized using gensim version word2vec algorithm which is a library of Python, and attempt to visualize clusters in three dimensions using t-SNE (t-distributed Stochastic Neighbor Embedding) which is dimensionality reduction algorithm. In particular, in this research, we used the data visualization tool Embedding Projector for clustering. By using this tool, we attempted to visually identify clusters by moving the three-dimensional space interactively while visualizing the dynamic learning process in the three-dimensional space. As a result, we could identify multiple clusters with high accuracy. This made it possible to clarify what in this election Twitter users were interested in.