4:10 PM - 4:30 PM
[2B5-GS-2-03] Transformer-based World Models with Object-Centric Representations
Keywords:World Models, Reinforcement Learning , Object Centric Representations
World models mimic observed dynamics to aid learning complex behaviors. However, in situations such as playing games, where different dynamics with distinct characteristics coexist within the same screen, effective learning of world models becomes challenging. This challenge has been identified in tasks like video prediction, and recent efforts have explored solutions using object-centric representations. In this paper, we present transformer-based world models with object-centric representations combining world models with a method for video prediction using object-centric representations. This approach uses object features to model spatiotemporal relationships and predict future states accurately based on actions. The transformer receives multiple latent states from object-centric representations, rewards, and actions, flexibly adapting to all modalities across different time steps. It is expected to distinguish dynamics with distinct characteristics for each object, predicting accurate future states in response to actions. We validated the effectiveness of our method using Atari 100k benchmark's Boxing, demonstrating its utility.
Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.