Generalized Few-Shot Siamese Semantic Segmentation with Pyramid Vision Transformer Backbone

Francis Sanco

2:20 PM - 2:40 PM

[4K3-IS-2f-02] Generalized Few-Shot Siamese Semantic Segmentation with Pyramid Vision Transformer Backbone

〇Francis Sanco¹, Clifford Broni-Bediako², Massayasu Atsumi¹ (1. Soka University, 2. RIKEN Center for Advanced Intelligence, Tokyo, Japan)

Keywords:Deep Learning, Semantic Segmentation, Generalized Few-shot

Few-shot semantic segmentation enables pre-trained networks to generalize to new data with minimal labelled samples per class, addressing challenges of data scarcity and annotation cost. While few-shot learning methods have shown success, a more practical challenge lies in segmenting both base classes (pre-trained classes) and novel classes (new classes with few examples) in a single task. So, Generalized Few-Shot Semantic Segmentation(GFSS) was introduced, evaluating models on their ability to handle familiar and unseen classes. Existing approaches use VGG and ResNet backbones, but struggle with handling multi-scale features, which is crucial for segmenting varying size objects. Additionally, Siamese learning has proven effective for few-shot tasks but has not been widely explored in generalized few-shot learning. This paper proposes a novel solution by integrating Pyramid Vision Transformer (PVT), which introduces multi-scale features into transformers, with a Siamese Transformer Module(STM) for enhanced adaptation of support features to query features. Our approach aims to improve effectiveness and robustness of GFSS, addressing scale variation challenges and the need for better adaptation to novel class.
Our work aims to:
Show the capabilities of PVT for dense predictions
Extend Siamese networks for GFSS

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4K3-IS-2f] Machine learning

[4K3-IS-2f-02] Generalized Few-Shot Siamese Semantic Segmentation with Pyramid Vision Transformer Backbone

Password