Minecraft bot Voyager programs itself using GPT-4



summary
Summary

Voyager uses GPT-4 to guide a learning Minecraft agent through the pixel world. Instead of reinforcement learning, Voyager relies on code generation.

Researchers from Nvidia, Caltech, UT Austin, Stanford, and ASU introduce Voyager, the first lifelong learning agent that plays Minecraft. Unlike other Minecraft agents that use classic reinforcement learning techniques, for example, Voyager uses GPT-4 to continuously improve itself. It does this by writing, improving, and transferring code stored in an external skill library.

This results in small programs that help navigate, open doors, mine resources, craft a pickaxe, or fight a zombie. “GPT-4 unlocks a new paradigm,” says Nvidia researcher Jim Fan, who advised the project. In this paradigm, “training” is the execution of code and the “trained model” is the code base of skills that Voyager iteratively assembles.

Voyager consists of three main components:

ad

  1. An iterative prompting mechanism that incorporates feedback from the game, execution errors, and self-checking to refine programs.
  2. A skill library with code for storing and retrieving complex behaviors.
  3. An automated curriculum to maximize exploration.

Video: Wang, Xie, Jiang, Mandlekar et al.

Voyager Minecraft agent learns in context

The Minecraft agent learns in an iterative fashion: Voyager writes a program with GPT-4 to achieve a goal and uses feedback from the game environment and possible Javascript errors to refine the program with GPT-4. In this way, Voyager gradually builds a library of skills and stores successful programs in a vector database. Complex skills are built from simpler skills.

Video: Wang, Xie, Jiang, Mandlekar et al.

To explore the diverse world of Minecraft, the team uses an automated curriculum that suggests appropriate exploration tasks based on the agent’s current skills and the current state of the world. For example, the agent learns to collect sand and cactus in a desert before digging for iron.

Voyager uses information about the environment to plan new tasks with GPT-4. | Image: Wang, Xie, Jiang, Mandlekar et al.

Together, this creates an agent that is constantly learning and can perform a variety of tasks. The team runs all experiments in the MineDojo environment.

Recommendation

Voyager project page. The code is available on GhitHub.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top