Presentation information

General Session

General Session » GS-7 Vision, speech media processing

[4I1-GS-7b] 画像音声メディア処理:マルチモーダル処理

Fri. Jun 11, 2021 9:00 AM - 10:40 AM Room I (GS room 4)

座長:石原 賢太(NEC)

9:20 AM - 9:40 AM

[4I1-GS-7b-02] Meta-learning Method for Multi-modal Few-shot One-class Image Classification

〇Takumi Ohkuma1, Hideki Nakayama1 (1. The University of Tokyo)

Keywords:Meta learning, Few-shot learning, Zero-shot learning, Multi-modal learning, computer vision

One-class image classification is the task of discriminating whether images belong to a certain class, and this task is important for the recognition of certain visual concepts.
Human is good at solving this task with only a few data, and the performance of few-shot learning methods of previous works is much less than that of human.
To improve the performance, we propose ``Multi-modal Belongingness Network (MMBeNet)'', which is an extended model of ``Belongingness Network (MMBeNet) \cite{BeNet}'', to use not only a few image data but also semantic information such as attributes and word vector, and call this task ``multi-modal few-shot one-class image classification’’.
We consider that semantic information is an important factor of the high ability of humans and confirm that it is effective for this task to improve the performance through experiments.
Besides, MMBeNet can solve not only multi-modal tasks but also image-only few-shot and zero-shot tasks by a single model.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.