JSAI2025

Presentation information

General Session

General Session » GS-5 Language media processing

[4G1-GS-6] Language media processing:

Fri. May 30, 2025 9:00 AM - 10:40 AM Room G (Room 1002)

座長:高瀬 翔(SB Intuitions)

9:20 AM - 9:40 AM

[4G1-GS-6-02] An Analysis Using Language Embeddings for Semantic and Phonological Similar Pair Extraction in Japanese and Korean Onomatopoeia

〇Shunnosuke Motomura1, Yuki Kubo1, Yuji Nozaki1,2, Maki Sakamoto1,2 (1. Kansei AI Co.,Ltd, 2. The University of Electro-Communications)

Keywords:onomatopoeia, sound symbolism, word embedding, Korean Language

Japanese and Korean are both known for having a large number of onomatopoeia. Onomatopoeia is known to have a strong connection between meaning and sound, known as sound symbolism. However, it is not yet well known how much Japanese and Korean share this sound symbolism. Therefore, we propose a method to quantitatively and automatically calculate two types of similarities: semantic similarity and phonological similarity. We conducted a verification of the onomatopoeia in both languages. Specifically, for semantic similarity, we used fastText and Transformer-based models. For phonological similarity, we proposed a method involving both romanization and IPA (International Phonetic Alphabet) transcription, and examined how well these methods aligned with human subjective evaluations. By using these two similarity calculation methods, we explored the feasibility of extracting pairs of onomatopoeia from both languages that are similar in both meaning and sound, thus approaching the potential for data-driven, quantitative methods in contrastive linguistic analysis.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password