Google Deepmind’s new AI agent plays games using only natural language



Deepmind’s SIMA can perform tasks in different video game worlds, such as Valheim or No Man’s Sky, using only text prompts.

Google Deepmind researchers introduce SIMA (Scalable Instructable Multiworld Agent), an AI agent for 3D video game environments that can translate natural language instructions into actions.

SIMA was trained and tested in collaboration with eight game studios and across nine different video games, including No Man’s Sky, Valheim, and Teardown.

Video: Google Deepmind



The Deepmind team trained SIMA using game recordings in which a player either gave instructions to another player or described their own game. The team then linked these instructions to game actions.

The agent is primarily trained to imitate behavior (behavioral cloning). It imitates the actions performed by the people in the collected data while following the language instructions.

In this way, the agent learns to make connections between the language descriptions, visual impressions, and corresponding actions.

Google Deepmind SIMA uses pre-trained models and learns from humans

The core of the SIMA agent consists of several components that work together to convert visual input (what the agent “sees”) and language input (the instructions it receives) into actions (keyboard and mouse commands).

Image: Google Deepmind

Image and text encoders are responsible for translating the visual and language input into a form that the agent can process. This is done using pre-trained models that already have a comprehensive understanding of images and text.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top