On January 6, Google DeepMind announced the formation of a new team dedicated to the development of "massive" generative models designed to "simulate the world," marking a significant step forward in the realm of artificial intelligence (AI). This initiative is part of an ongoing push towards enhancing decision-making, planning, and creativity through advanced AI capabilities.
World models, the focus of this new team, are sophisticated computational frameworks that help AI systems understand and replicate real or virtual environments. These models are crucial for a variety of applications, particularly in robotics, gaming, and autonomous systems. For instance, autonomous vehicles rely on world models to simulate traffic and road conditions, allowing them to learn how to navigate effectively. Moreover, these models can assist in training generalist AI robots across varying environments. A notable challenge in this field is the availability of rich, diverse, and safe training environments that are necessary for "embodied AI."
In a job posting issued on the same day, Google DeepMind highlighted the importance of scaling AI models for technological evolution, stating, “We believe scaling pretraining on video and multimodal data is on the critical path to artificial general intelligence.” The company outlined that these world models would support various domains, including visual reasoning, simulation, and real-time interactive entertainment.
At the helm of this new initiative is Tim Brooks, who recently transitioned from OpenAI to Google DeepMind in October. Brooks formerly co-led the successful development of Sora, a sophisticated video generation model that garnered significant attention upon its release.
Google DeepMind's recruitment efforts are geared towards collaborative enhancements of existing projects, drawing on the foundational work of its Gemini model—Google’s notable large multimodal model—as well as advanced video generation and world model teams including Veo and Genie.
This announcement comes at a time when AI startups are also making significant strides in the field. For example, World Labs, recently emerging from stealth mode, announced in September that it had secured $230 million in funding to develop large world models. The startup is spearheaded by Stanford AI pioneer Fei Fei Li and has attracted funding from notable figures such as Nobel laureate Geoffrey Hinton, Salesforce CEO Marc Benioff, LinkedIn co-founder Reid Hoffman, and former Google Chairman Eric Schmidt, amongst others.
DeepMind has previously launched several world models, including Genie and its successor Genie 2. Unlike its predecessor, which could only generate 2D environments, Genie 2 extends capabilities to create interactive 3D worlds that respond dynamically to user interactions. This model employs a video dataset and an innovative autoencoder process to simplify video frames into meaningful representations, which are then further analysed by a transformer model to predict sequential video progression, similar to text-based generation models such as ChatGPT.
The advances presented by Genie 2 include the realistic display of object interactions, complex character animations, and the simulation of physics, enabling the generation of engaging virtual environments that can sustain interaction, typically lasting from 10 to 20 seconds.
As Google DeepMind intensifies its focus on world models, it positions itself to enhance the capabilities of its AI systems in a competitive landscape that includes major players such as OpenAI, Meta, Microsoft, and Amazon. Notably, Google DeepMind has already received recognition for its groundbreaking work, including the Nobel Prize nominations for CEO Demis Hassabis and John M. Jumper for AlphaFold2, an AI model that has made significant contributions to the understanding of proteins, resolving a longstanding challenge in biochemistry.
In October, DeepMind researchers revealed their application of a large language model dubbed the Habermas Machine, utilised as an AI mediator that aided small groups in the U.K. to find consensus on divisive topics such as Brexit and immigration. This model crafted a collective statement reflecting the shared viewpoints of these groups, showcasing yet another innovative use of AI technology in handling complex societal issues.
Source: Noah Wire Services