Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action
NVIDIA has released Cosmos 3, an open omni-model for physical AI reasoning and action, now available on Hugging Face.

Physical AI Reasoning and Action">
NVIDIA has unveiled Cosmos 3, a groundbreaking world foundation model (WFM) that integrates world generation, physical reasoning, and action generation into a single, unified architecture. This innovative model is designed to simulate and understand the physical world, making it an ideal foundation for various applications, including robotics, autonomous vehicles, and smart spaces. Cosmos 3 represents a significant leap forward in WFMs, offering a range of capabilities that were previously only available through separate models and inference pipelines.
With Cosmos 3, developers can now access a single model that can reason and generate different modalities in one unified forward pass. This includes world generation, controlled generation, scene understanding, and policy generation. The model is built on a Mixture-of-Transformers (MoT) architecture, which enables it to process multiple input and generation modalities, including text, image, video, audio, and action.
Cosmos 3 also features a shared representation space that allows for seamless switching between different tasks, such as video generation, action generation, and physical reasoning. NVIDIA has made Cosmos 3 available on Hugging Face, along with a range of tools and resources to support developers. These include a prompting guide, API documentation, and a set of Synthetic Data Generation (SDG) datasets to help train and evaluate WFMs.
The company has also released the Cosmos Framework, an end-to-end framework for training and serving WFMs like Cosmos 3. "Cosmos 3 is the result of amazing collaboration between many teams and people across NVIDIA," said the company. The model is available in two sizes, optimized for different deployment scenarios, and can be integrated with existing pipelines using the Hugging Face Diffusers library.
Developers can access Cosmos 3 through the Hugging Face platform and explore its capabilities using the provided tools and resources. With Cosmos 3, NVIDIA is poised to accelerate the development of physical AI systems that can understand and interact with the real world.
Source: Hugging Face