JSAI2022

Presentation information

Interactive Session

General Session » Interactive Session

[4Yin2] Interactive session 2

Fri. Jun 17, 2022 12:00 PM - 1:40 PM Room Y (Event Hall)

[4Yin2-55] A Study of Entity Linking Methods Considering Audio Similarity between User's Speech and Entity

〇Asahi Hentona1, Takamichi Toda1, Yuta Tomomatsu1, Masakazu Sugiyama1, Yuki Azuma1, Sho Shimoyama1 (1.AI Shift, Inc.)

Keywords:Speech Dialog System, Entity Linking, Speech Recognition

To improve the performance of Entity Linking on spoken dialogue systems, we explore the similarity computation methods based on audio features.We compute the similarity between a user's speech and each entity in the knowledge base, then link the user's speech with the entity that is most similar to the user's speech.In experiments, we used the phoneme sequence-based methods (edit distance, semi-global alignment) that use Automatic Speech Recognition (ASR) results, and the methods which do not use ASR uses speech data directly (Mel spectrogram, wav2vec 2.0 ).Using features extracted from speech data provides robustness against speech recognition errors.In the experimental results using log data from our spoken dialog service, the methods using phoneme sequences are less affected by filler and silence intervals, showed higher performance than the methods using speech data.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password