Leveraging Transformer in Object-Centric Deep Generative Models

Yuya Kobayashi

4:00 PM - 4:20 PM

[4G4-GS-2m-02] Leveraging Transformer in Object-Centric Deep Generative Models

〇Yuya Kobayashi¹, Masahiro Suzuki¹, Yutaka Matsuo¹ (1. The University of Tokyo Graduate School of Engineering)

Keywords:Deep Generative Models, World Model, Scene Interpretation, Object Recognition

Recently, object recognition by deep learning has shown promising results, but it is basically using supervised learning which requires numerous ground truth data. On the other hand, these days, unsupervised object detection method by deep generative models (Scene Interpretation) has been gaining much attention. This method does not require ground truth data, and is also able to perform representation learning about target objects. This kind of method can be essential for development and understanding of agents which act and make decisions in complex environment. However, application of these method is restricted to simple toy datasets and it is reported that they cannot deal with complex datasets such as real images. In this research, we propose novel scene interpretation architecture using Transformer and self-supervised training method as well. We will show that proposed model is viable even for the images existing method cannot deal with well.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[4G4-GS-2m] 機械学習：学習方略(2/2)

[4G4-GS-2m-02] Leveraging Transformer in Object-Centric Deep Generative Models

Password