Finding Everyday Objects Using Physical-World Search Engines: a Learning–To–Rank Approach

Kanta Kaneda

9:00 AM - 9:20 AM

[3G1-OS-24a-01] Finding Everyday Objects Using Physical-World Search Engines: a Learning–To–Rank Approach

〇Kanta Kaneda¹, Motonari Kambara¹, Komei Sugiura¹ (1. Keio University)

Keywords:Learning to Rank, Multimodal Language Processing, Learning to Rank Physical Objects Task

In this study, we focus on the learning-to-rank physical objects task, which involves retrieving target objects from open-vocabulary user instructions in a human-in-the-loop setting. We propose MultiRankIt, which introduces the Crossmodal Noun Phrase Encoder to model the relationship between referring expressions and target bounding box, and the Crossmodal Region Feature Encoder to model the relationship between the target object and its surrounding contextual environment. Our model outperforms the baseline method in terms of mean reciprocal rank and recall@K.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3G1-OS-24a] 日常生活知識とAI

[3G1-OS-24a-01] Finding Everyday Objects Using Physical-World Search Engines: a Learning–To–Rank Approach

Password