Author Identification of Japanese works using Doc2Vec and BERT

Taishi Shimizu

[3Rin4-78] Author Identification of Japanese works using Doc2Vec and BERT

〇Taishi Shimizu¹ (1.Graduate School of Arts and Sciences, The University of Tokyo)

Keywords:Classification, Natural language understanding

There has been much research on author identification based on a text for a long time. In Japanese texts, many researchers have taken various methods that focus on features such as the distribution of n-grams of parts of speech and the distribution of characters. They also used various models such as random forest method and neural network as classification models. In this paper, I focused on Doc2Vec proposed in 2014 and BERT in 2018 and performed supervised learning using these models and neural networks. I downloaded these works used as training and test data from "Aozora Bunko" and converted them into a numerical vector using Doc2Vec and use it as the input of the neural network. I performed Multinomial classification learning and got results in the accuracy of 84.89% for Doc2Vec and 55.43% for BERT.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3Rin4] Interactive 1

[3Rin4-78] Author Identification of Japanese works using Doc2Vec and BERT

Password