JSAI2024

Presentation information

General Session

General Session » GS-5 Language media processing

[1G3-GS-6] Language media processing:

Tue. May 28, 2024 1:00 PM - 2:40 PM Room G (Room 22+23)

座長:赤間 怜奈(東北大学)

2:00 PM - 2:20 PM

[1G3-GS-6-04] Study of Speech-to-Text Dialogue Model Using Continuous Expressions

〇Hyuga Nakaguro1, Seiya Kawano1,2, Angel Garcia Contreras2, Koichiro Yoshino1,2 (1. Nara Institute of Science and Technology, 2. RIKEN)

Keywords:LLM, Spoken dialog system

Large language models (LLMs) are flexible and can handle various natural language processing tasks. Many spoken dialogue systems are realized by linking a dialogue model built using an LLM with other modules, such as speech recognition or synthesis systems. However, such a cascaded model with multiple modules is complicated and tends to propagate errors from the previous module. The model can also not consider sensitive expressions in the non-verbal representation of dialogue because the discrete representation, such as texts, is used to connect modules. This research aims to solve these problems by converting the input speech into a vector of continuous expressions and connecting it to a dialogue model. The experimental results show that the generated sentences do not fully take the dialogue context into account, and there is room for improvement, but the natural sentence generation is learned, suggesting that a dialogue model using continuous expressions is feasible.

Please log in with your participant account.
» Participant Log In