2025年度 人工知能学会全国大会(第39回)

講演情報

国際セッション

国際セッション » IS-2 Machine learning

[4K3-IS-2f] Machine learning

2025年5月30日(金) 14:00 〜 15:20 K会場 (会議室1006)

Chair: Nattawut Kertkeidkachorn

14:20 〜 14:40

[4K3-IS-2f-02] Generalized Few-Shot Siamese Semantic Segmentation with Pyramid Vision Transformer Backbone

〇Francis Sanco1, Clifford Broni-Bediako2, Massayasu Atsumi1 (1. Soka University, 2. RIKEN Center for Advanced Intelligence, Tokyo, Japan)

キーワード:Deep Learning, Semantic Segmentation, Generalized Few-shot

Few-shot semantic segmentation enables pre-trained networks to generalize to new data with minimal labelled samples per class, addressing challenges of data scarcity and annotation cost. While few-shot learning methods have shown success, a more practical challenge lies in segmenting both base classes (pre-trained classes) and novel classes (new classes with few examples) in a single task. So, Generalized Few-Shot Semantic Segmentation(GFSS) was introduced, evaluating models on their ability to handle familiar and unseen classes. Existing approaches use VGG and ResNet backbones, but struggle with handling multi-scale features, which is crucial for segmenting varying size objects. Additionally, Siamese learning has proven effective for few-shot tasks but has not been widely explored in generalized few-shot learning. This paper proposes a novel solution by integrating Pyramid Vision Transformer (PVT), which introduces multi-scale features into transformers, with a Siamese Transformer Module(STM) for enhanced adaptation of support features to query features. Our approach aims to improve effectiveness and robustness of GFSS, addressing scale variation challenges and the need for better adaptation to novel class.
Our work aims to:
Show the capabilities of PVT for dense predictions
Extend Siamese networks for GFSS

講演PDFパスワード認証
論文PDFの閲覧にはログインが必要です。参加登録者の方は「参加者用ログイン」画面からログインしてください。あるいは論文PDF閲覧用のパスワードを以下にご入力ください。

パスワード