At this year's CES 2025, held in Las Vegas, Nvidia made a significant announcement regarding the development of world models, artificial intelligence systems that mimic the way humans form mental representations of the world. The company introduced its family of models, known as the Cosmos World Foundation Models (Cosmos WFMs), designed to facilitate the generation and prediction of “physics-aware” videos.
These models, which can be fine-tuned for specific uses, are being made available through Nvidia’s API, NGC catalogs, GitHub, and the community-driven AI development platform Hugging Face. Nvidia's official blog, referenced in TechCrunch, stated, “Nvidia is making available the first wave of Cosmos WFMs for physics-based simulation and synthetic data generation.” The models are released under a permissive open model license that enables free use for both researchers and developers, regardless of the size of their organisations.
The Cosmos WFM family comprises several models tailored for varying levels of performance: Nano for low latency and real-time applications, Super for robust baseline functionality, and Ultra for maximum quality and fidelity. The sizes of these models range from 4 billion to 14 billion parameters, with the number of parameters correlating to the model's problem-solving capabilities. Typically, models with a higher number of parameters demonstrate enhanced performance.
As part of the Cosmos line, Nvidia is also introducing additional resources, including an upsampling model specifically optimised for augmented reality, along with guardrail models aimed at ensuring responsible application of the technology. Additionally, fine-tuned models are created for tasks such as generating sensor data essential for the development of autonomous vehicles. These models have been trained on a substantial dataset comprising 9,000 trillion tokens, equivalent to 20 million hours of real-world interaction data encompassing various domains including environment, industrial processes, robotics, and driving data.
While Nvidia provided a statement on the training data used, it refrained from disclosing specific sources, although there are ongoing allegations regarding the potential use of copyrighted YouTube videos in the training data. In response to these claims, an Nvidia spokesperson clarified to TechCrunch that “Cosmos isn’t designed to copy or infringe any protected works.” They elaborated, stating, “To help Cosmos learn, we gathered data from a variety of public and private sources and are confident our use of data is consistent with both the letter and spirit of the law.” Legal analysts, however, have pointed out that Nvidia’s assertions surrounding copyright and fair use may face challenges in judicial settings, as the outcomes will largely depend on the evolving interpretation of fair use rights as they pertain to artificial intelligence training.
Nvidia reported that the Cosmos WFM models are capable of generating “controllable, high-quality” synthetic data when provided with text or video frames, which can then be employed for training models pertinent to robotics, autonomous driving, and other applications. The ability of Cosmos to simulate realistic environments—like factory floors—has been highlighted as a key feature. According to Nvidia, “developing with Cosmos WFMs allows customisation with data sets, such as video recordings of autonomous vehicle trips or robots navigating a warehouse.”
Notably, several companies, including Waabi, Wayve, Fortellix, and Uber, have expressed intentions to pilot the use of Cosmos WFMs across various applications, from video search to the advancement of AI models for self-driving technology. Uber's CEO, Dara Khosrowshahi, remarked, “Generative AI will power the future of mobility, requiring both rich data and very powerful compute...we are confident that we can help supercharge the timeline for safe and scalable autonomous driving solutions for the industry,” whilst affirming their collaborative efforts with Nvidia.
It's important to note that Nvidia’s world models do not strictly adhere to the definition of “open source,” which typically requires sufficient information about a model’s architecture and training data to allow for substantial recreation. Nvidia has not published comprehensive information regarding the training datasets or all necessary tools to recreate the models, leading to the company's characterisation of Cosmos WFM as "open" rather than open source.
Nvidia's CEO Jensen Huang expressed ambitious expectations for the Cosmos models, stating, “We really hope [Cosmos will] do for the world of robotics and industrial AI what Llama … has done for enterprise,” positioning these advancements as pivotal for future developments in both robotics and artificial intelligence.
Source: Noah Wire Services