Keywords:minutes of local assembly, word embeddings, topic analysis, pretrained word embeddings
In our present study, we analyze what topics are discussed in local assembly using text mining methods. Although there have been several studies to analyze discussion using topic model, existing studies does not evaluate topics for words obtained using several word segmentation dictionaries and have not discussed about the effectiveness of word embeddings obtained from large-scale training data. In this paper, we obtain the topics for the NTCIR14 Segmentation task data set by using Embedded Topic Model(ETM) that either uses the word embeddings trained on Regional Assembly Minutes Corpus or Wikipedia articles. Then, we compare the topics obtained by the two topic models. Experimental results show that, by using the Comainu dictionary, we can easily understand the meaning of the topic and easily assign labels to topics. However, we can not confirm a clear difference between the topic models trained on Regional Assembly Minutes Corpus and Wikipedia articles.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.