Synthetic Remote Sensing Images for Self-Supervised Pre-Training of Vision Transformers

Luiz Henrique Mormille

4:40 PM - 5:00 PM

[2K5-IS-1b-04] Synthetic Remote Sensing Images for Self-Supervised Pre-Training of Vision Transformers

〇Luiz Henrique Mormille¹, Iskandar Salama¹, Masayasu Atsumi¹ (1. Soka University)

Keywords:Stable Diffusion, Remote Sensing, Vision Transformers

Advancements in remote sensing image analysis often rely on high-quality datasets and robust model pre-training techniques. This work-in-progress explores the potential of synthetic remote sensing images for domain-specific pre-training of Vision Transformers (ViTs). Using textual inversion, we fine-tune a Stable Diffusion model to generate a large-scale dataset of 1 million high-quality synthetic remote sensing images. These images are then employed to pre-train a Vision Transformer on a self-supervised learning task, enabling the model to learn domain-specific representations effectively. The subsequent step involves transferring the knowledge from the pre-trained model to real-world remote sensing tasks. We hypothesize that pre-training on a large-scale, domain-specific dataset will enhance the performance of Vision Transformers when fine-tuned for real-world applications, particularly in scenarios where labeled data is limited. In addition to evaluating the impact of domain-specific pre-training on the downstream task performance, this study contributes to the research community by making its dataset publicly available, aiming to facilitate the research on the use synthetic data for remote sensing applications.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[2K5-IS-1b] Knowledge engineering

[2K5-IS-1b-04] Synthetic Remote Sensing Images for Self-Supervised Pre-Training of Vision Transformers

Password