Keywords:Information Retrieval, Named Entity Extraction, Named Entity Recognition
We propose a task to identify a product name from an EC page title. On EC pages, sellers need to design their posts to increase the visibility of their products in search results. One of the common techniques is including extra information to the title of their product page. However, adding many keywords can result in such a complicated page title that it is hard for buyers to distinguish a product name from the title. Therefore, extracting product names is important, yet has some challenges especially when titles are in Japanese. (1) Most titles do not have standard grammatical structures. (2) Diverse characters, such as Kanjis, Kanas, alphanumerics, and symbols often appear in a single title. These make models hardly handle the boundaries of words and lead to incorrect learning. In this work, we create a corpus and evaluate several conventional approaches for basic analysis. The results show that this task is still challenging; an existing approach for named entity recognition, which performs very well at some open datasets, can only achieve 23.0 of the F1 score with our dataset.