JSAI2025

Presentation information

Organized Session

Organized Session » OS-41

[2B1-OS-41d] OS-41

Wed. May 28, 2025 9:00 AM - 10:40 AM Room B (Small hall)

オーガナイザ:鈴木 雅大(東京大学),岩澤 有祐(東京大学),河野 慎(東京大学),熊谷 亘(オムロンサイニックエックス),松嶋 達也(東京大学),Paavo Parmas(東京大学),谷口 尚平(東京大学)

9:40 AM - 10:00 AM

[2B1-OS-41d-03] Language Embedded 3D Gaussians at City-Scale for Geography-Aware Visual Programming

〇Shunsuke Yasuki1, Taiki Miyanishi2,3, Nakamasa Inoue4, Shuhei Kurita5, Koya Sakamoto6,3, Daichi Azuma7, Jungdae Lee4, Masato Taki1, Yutaka Matsuo2 (1. Rikkyo University, 2. Univ. of Tokyo, 3. ATR, 4. Institute of Science Tokyo, 5. National Institute of Informatics, 6. Kyoto University, 7. Sony Semiconductor Solutions)

Keywords:3D Gaussian Splatting, Visual Programming, Multimodal features, Geographical Vision Task, In-Context Learning

We propose GeoProg3D, a visual programming framework that enables natural language interaction with city-scale 3D scenes. GeoProg3D controls two important innovations that we introduce: Geography-aware City-scale 3D Language Field (GCLF) and Geographical Vision APIs (GV-APIs). GCLF extends language fields to city-scale 3D data, allowing precise queries based on geographic information. GV-API provides specialized geographical vision processing tools such as segmentation and object detection. GeoProg3D constructs executable programs by dynamically composing GCLF and GV-API components, resulting in accurate geographic inference. To evaluate this approach, we introduce GeoEval3D dataset, which contains 952 query-answer pairs for five challenging geographical vision tasks: grounding, spatial reasoning, comparison, counting, and measurement. Experimental results show that GeoProg3D outperforms existing models on a variety of geographic vision tasks. This framework is expected to be applied to urban planning, disaster response, environmental monitoring, and other fields.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password