Extraction of Scene-Specific Co-Occurrence Information Using Large Language Models and Its Application to Robot Scene Understanding

Kenta Gunji Gunji

5:00 PM - 5:20 PM

[3Q5-GS-8-05] Extraction of Scene-Specific Co-Occurrence Information Using Large Language Models and Its Application to Robot Scene Understanding

〇Kenta Gunji Gunji¹, Kazunori Ohno¹, Shuhei Kurita², Ken Sakurada³, Satoshi Tadokoro¹ (1. Tohoku university, 2. National Institute of Informatics, 3. Kyoto university)

Keywords:LLM, Scene Understanding, Scene Grpah

To enable a robot to act appropriately in its operational space, it is crucial to understand the relationships between objects specific to a given context. This is because the arrangement and associations of objects determine their functionality and purpose within a scene. By accurately capturing these relationships, a robot can comprehend the intent of a scene and effectively plan and execute tasks.
This study proposes a method for extracting scene-specific co-occurrence information from large language models (LLMs). While LLMs provide extensive co-occurrence knowledge, their accuracy declines in specific contexts, necessitating additional fine-tuning for real-world applications. Our approach extracts scene-specific co-occurrence information based on object placement by incorporating the surrounding objects’ information when generating co-occurrence data for objects A and B.
This method generates contextually appropriate co-occurrence information without additional training, making it suitable for specific scenes and environments. By emphasizing the functional relationships formed by object groups, we demonstrate its high effectiveness in applications such as scene understanding.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[3Q5-GS-8] Robot and real worlds:

[3Q5-GS-8-05] Extraction of Scene-Specific Co-Occurrence Information Using Large Language Models and Its Application to Robot Scene Understanding

Password