[3Rin4-78] Author Identification of Japanese works using Doc2Vec and BERT
Keywords:Classification, Natural language understanding
There has been much research on author identification based on a text for a long time. In Japanese texts, many researchers have taken various methods that focus on features such as the distribution of n-grams of parts of speech and the distribution of characters. They also used various models such as random forest method and neural network as classification models. In this paper, I focused on Doc2Vec proposed in 2014 and BERT in 2018 and performed supervised learning using these models and neural networks. I downloaded these works used as training and test data from "Aozora Bunko" and converted them into a numerical vector using Doc2Vec and use it as the input of the neural network. I performed Multinomial classification learning and got results in the accuracy of 84.89% for Doc2Vec and 55.43% for BERT.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.