JSAI2024

Presentation information

Organized Session

Organized Session » OS-16

[3O5-OS-16c] OS-16

Thu. May 30, 2024 3:30 PM - 4:50 PM Room O (Music studio hall)

オーガナイザ:鈴木 雅大(東京大学)、岩澤 有祐(東京大学)、河野 慎(東京大学)、熊谷 亘(東京大学)、松嶋 達也(東京大学)、森 友亮(株式会社スクウェア・エニックス)、松尾 豊(東京大学)

4:30 PM - 4:50 PM

[3O5-OS-16c-04] Large-Scale Indoor Search Engine with Multimodal Foundation Models and Relaxing Contrastive Loss

〇Yuto Imai1, Kanta Kaneda1, Ryosuke Korekata1, Komei Sugiura1 (1. Keio University)

Keywords:Learning to rank

In this paper, we focus on the learning-to-rank physical objects task. In this task, images of objects within large-scale indoor environments are ranked based on open-vocabulary user instructions. We introduce the GREP module to construct visual features considering image, target object, relative positions, and pixel granularities. Additionally, we introduce the RCS module to efficiently learn from redundant images taken in the indoor environment. Our method outperformed baseline methods on the newly constructed YAGAMI dataset and an extended LTRRIE-subset, showing significant improvements in the standard metrics.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password