Evaluation of Offline Pretraining Methods for World Models Using Instruction Expansion with Large Language Models and Two-Stage Pretraining

Yusei Koen

6:00 PM - 6:20 PM

[1B5-OS-41c-02] Evaluation of Offline Pretraining Methods for World Models Using Instruction Expansion with Large Language Models and Two-Stage Pretraining

〇Yusei Koen¹, Yuji Fujima², Yasuhiro Takeda¹, Makoto Kawano¹, Yutaka Matsuo¹ (1. University of Tokyo, 2. Keio University)

Keywords:World Models, Large Language Models, Offline Pre-training, Model-Based Reinforcement Learning

Recent studies have demonstrated that offline data, such as text, can significantly enhance the efficiency of task learning through the pretraining of world models. In particular, Dynalang has demonstrated its effectiveness in leveraging task instructions and environmental dynamics to enhance performance. However, its application has been primarily limited to the Messenger task, leaving its generalizability to other tasks and the impact of text type and quality in pretraining insufficiently explored. In this study, we extend Dynalang's approach to the simpler HomeGrid task to evaluate its generalizability. We also explore the use of large language models (LLMs) to generate and expand domain-specific text, aiming to further improve initial task performance and sample efficiency. Additionally, we propose and assess a two-stage pretraining strategy: general text is first used to develop fundamental language understanding, followed by domain-specific text to strengthen task-specific capabilities. Our findings highlight the potential of expanding the applicability of text-based pretraining strategies.

Authentication for paper PDF access
A password is required to view paper PDFs. If you are a registered participant, please log on the site from Participant Log In.
You could view the PDF with entering the PDF viewing password bellow.

Presentation information

[1B5-OS-41c] OS-41

[1B5-OS-41c-02] Evaluation of Offline Pretraining Methods for World Models Using Instruction Expansion with Large Language Models and Two-Stage Pretraining

Password