Synthetic Remote Sensing Images for Self-Supervised Pre-Training of Vision Transformers

Luiz Henrique Mormille

16:40 〜 17:00

[2K5-IS-1b-04] Synthetic Remote Sensing Images for Self-Supervised Pre-Training of Vision Transformers

〇Luiz Henrique Mormille¹, Iskandar Salama¹, Masayasu Atsumi¹ (1. Soka University)

キーワード：Stable Diffusion, Remote Sensing, Vision Transformers

Advancements in remote sensing image analysis often rely on high-quality datasets and robust model pre-training techniques. This work-in-progress explores the potential of synthetic remote sensing images for domain-specific pre-training of Vision Transformers (ViTs). Using textual inversion, we fine-tune a Stable Diffusion model to generate a large-scale dataset of 1 million high-quality synthetic remote sensing images. These images are then employed to pre-train a Vision Transformer on a self-supervised learning task, enabling the model to learn domain-specific representations effectively. The subsequent step involves transferring the knowledge from the pre-trained model to real-world remote sensing tasks. We hypothesize that pre-training on a large-scale, domain-specific dataset will enhance the performance of Vision Transformers when fine-tuned for real-world applications, particularly in scenarios where labeled data is limited. In addition to evaluating the impact of domain-specific pre-training on the downstream task performance, this study contributes to the research community by making its dataset publicly available, aiming to facilitate the research on the use synthetic data for remote sensing applications.

講演PDFパスワード認証
論文PDFの閲覧にはログインが必要です。参加登録者の方は「参加者用ログイン」画面からログインしてください。あるいは論文PDF閲覧用のパスワードを以下にご入力ください。

講演情報

[2K5-IS-1b] Knowledge engineering

[2K5-IS-1b-04] Synthetic Remote Sensing Images for Self-Supervised Pre-Training of Vision Transformers

パスワード