9:40 AM - 10:00 AM
[2B1-OS-41d-03] Language Embedded 3D Gaussians at City-Scale for Geography-Aware Visual Programming
Keywords:3D Gaussian Splatting, Visual Programming, Multimodal features, Geographical Vision Task, In-Context Learning
We propose GeoProg3D, a visual programming framework that enables natural language interaction with city-scale 3D scenes. GeoProg3D controls two important innovations that we introduce: Geography-aware City-scale 3D Language Field (GCLF) and Geographical Vision APIs (GV-APIs). GCLF extends language fields to city-scale 3D data, allowing precise queries based on geographic information. GV-API provides specialized geographical vision processing tools such as segmentation and object detection. GeoProg3D constructs executable programs by dynamically composing GCLF and GV-API components, resulting in accurate geographic inference. To evaluate this approach, we introduce GeoEval3D dataset, which contains 952 query-answer pairs for five challenging geographical vision tasks: grounding, spatial reasoning, comparison, counting, and measurement. Experimental results show that GeoProg3D outperforms existing models on a variety of geographic vision tasks. This framework is expected to be applied to urban planning, disaster response, environmental monitoring, and other fields.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.