JSAI2023

Presentation information

General Session

General Session » GS-5 Language media processing

[2E4-GS-6] Language media processing

Wed. Jun 7, 2023 1:30 PM - 3:10 PM Room E (A2)

座長:赤間 怜奈(東北大学) [現地]

2:10 PM - 2:30 PM

[2E4-GS-6-03] Multimodal Deep Model for POI Category Prediction using Linguistic and Image Information

〇Issei Sawada1, Yusuke Okimoto2, Kenta Kanamori2, Itsuki Noda1, Satoshi Oyama1, Junji Saikawa2 (1. Hokkaido University, 2. Yahoo! Japan)

Keywords:multimodal deep learning, user reviews

The accuracy of POI (Point of Interest) categories is becoming increasingly important since numerous users use services that rely on POI categories nowadays. Machine learning models are widely used to infer POI categories from various information. Recently, it has been reported that multimodal deep models show high performance in many tasks. In this paper, we propose a multimodal deep model for POI category prediction using both linguistic and image information. In order to use image information effectively, the proposed model (1) introduces a loss against prediction based only on linguistic information and (2) introduces pooling to input multiple images for each POI. Using Yahoo! Japan's POI database, we confirmed that the proposed method improves the performance of POI category prediction compared to the baseline that uses only linguistic or image information.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password