JSAI2024

Presentation information

Organized Session

Organized Session » OS-16

[2O6-OS-16a] OS-16

Wed. May 29, 2024 5:30 PM - 6:50 PM Room O (Music studio hall)

オーガナイザ:鈴木 雅大(東京大学)、岩澤 有祐(東京大学)、河野 慎(東京大学)、熊谷 亘(東京大学)、松嶋 達也(東京大学)、森 友亮(株式会社スクウェア・エニックス)、松尾 豊(東京大学)

6:30 PM - 6:50 PM

[2O6-OS-16a-04] Learning Compositional Latents and Behaviors from Object-Centric Latent Imagination

〇Akihiro Nakano Nakano1, Masahiro Suzuki1, Yutaka Matsuo1 (1. The University of Tokyo)

Keywords:Representation learning, World models, Object-centric learning

In reinforcement learning settings, model-based methods are a promising approach. learns. This approach learns a world model from imagination, and learn complex behaviors to solve long-horizon tasks from visual inputs only. Recent world models using transformer have improved the sample-efficiency when solving these tasks, due to the transformer's ability to capture long-term dependencies. However, world models still struggle to solve compositional tasks, as predicting object interactions and accurately tracking objects, especially for unseen configurations are common difficulties. Object-centric learning is a method to learn to disentangle a scene or a video into each objects without supervision, leading to more compositional understanding and better generalization to unseen objects and scenes. In this paper, we propose a world model that uses object-centric latents to predict dynamics. Our model aims to combine the abilities of generalization by compositionality of object-centric learning and sample-efficiency and long-horizon prediction of transformer-based world models. To validate the efficacy of our approach, we conducted experiments on OCRL benchmark dataset.

Authentication for paper PDF access

A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Password