Response Design for Large Multimodal Models Leveraging Geographic Map Information

Ryoichi Kojima; Yasutaka Nishimura; Atsunori Minamikawa; Masato Taya

[2Win5-97] Response Design for Large Multimodal Models Leveraging Geographic Map Information

〇Ryoichi Kojima¹, Yasutaka Nishimura¹, Atsunori Minamikawa¹, Masato Taya¹ (1.KDDI Research, Inc.)

Keywords:AI, Multimodal, Geographical Information

This study proposes an approach to support the prediction of road traffic and weather in low-data environments by leveraging large multimodal models (LMMs). By building upon the commonsense knowledge embedded in pre-trained large language models (LLMs) and integrating external information such as map images, surrounding geographic data, and points of interest (POIs), this approach aims to enable diverse response generation. Instruction tuning datasets related to geographic and weather data were progressively developed, incorporating Japanese-specific geographic features and tourist resource data. Additionally, the adoption of multi-turn dialogue formats with speakers embodying diverse personas enhanced the diversity and practicality of the responses. This study suggests potential applications not only in supporting traffic and weather predictions but also in the development of smart cities and region-specific use cases.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2Win5] Poster session 2

[2Win5-97] Response Design for Large Multimodal Models Leveraging Geographic Map Information

Password