JSAI2025

Presentation information

General Session

General Session » GS-10 AI application

[4Q1-GS-10] AI application:

Fri. May 30, 2025 9:00 AM - 10:40 AM Room Q (Room 804)

座長:徳久 良子(愛知工業大学/理研)

9:20 AM - 9:40 AM

[4Q1-GS-10-02] A Method for Brand Name Variant Normalization by Integrating Character-level Embeddings and Edit Distance

〇Masataka Suzuki1, Yuto Okuda2, Ayako Yamagiwa1, Masayuki Goto1 (1. Waseda University, 2. The University of Tokyo)

Keywords:Inconsistent spelling, name collection, Character-level Embeddings, Levenshtein distance, edit distance

Purchase history data collected through consumer input is a valuable source for analyzing purchasing behavior across various retail stores. However, when data is entered by customers, character data such as product names often contain variations in notation, such as abbreviations and long vowel marks, which can create noise in the analysis.

Existing methods include name matching using edit distance and embeddings. However, conventional edit distance methods cannot account for Japanese language characteristics, and traditional embeddings are difficult to apply to short brand names. While large language models could be considered, their application may be impractical due to confidentiality and cost issues.

This research proposes char2vec to obtain character-level embeddings and defines a new edit distance utilizing these embeddings, enabling name matching even for short text data. We demonstrate the effectiveness of our method through application to real data, showing enhanced analytical possibilities with the matched data.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password