Nvidia CEO Jensen Huang said last year that we are now entering the age of physical AI. While the company continues to offer LLMs for software use cases, Nvidia is increasingly positioning itself as a provider of AI models for fully AI-powered systems — including agentic AI in the physical world.
At CES 2026, Nvidia announced a slate of new models designed to push AI agents beyond chat interfaces and into physical environments.
Nvidia launched Cosmos Reason 2, the latest version of its vision-language model designed for embodied reasoning. Cosmos Reason 1, released last year, introduced a two-dimensional ontology for embodied reasoning and currently leads Hugging Face’s physical reasoning for video leaderboard.
Cosmos Reason 2 builds on the same ontology while giving enterprises more flexibility to customize applications and enabling physical agents to plan their next actions, similar to how software-based agents reason through digital workflows.
Nvidia also released a new version of Cosmos Transfer, a model that lets developers generate training simulations for robots.
Other vision-language models, such as Google’s PaliGemma and Pixtral Large from Mistral, can process visual inputs, but not all commercially available VLMs support reasoning.
“Robotics is at an inflection point. We are moving from specialist robots limited to single tasks to generalist specialist systems,” said Kari Briski, Nvidia vice president for generative AI software, in a briefing with reporters. She was referring to robots that combine broad foundational knowledge with deep task-specific skills. “These new robots combine broad fundamental knowledge with deep proficiency and complex tasks.”
She added that Cosmos Reason 2 “enhances the reasoning capabilities that robots need to navigate the unpredictable physical world.”
Moving to physical agents
Briski noted that Nvidia’s roadmap follows “the same pattern of assets across all of our open models.”
“In building specialized AI agents, a digital workforce, or the physical embodiment of AI in robots and autonomous vehicles, more than just the model is needed,” Briski said. “First, the AI needs the compute resources to train, simulate the world around it. Data is the fuel for AI to learn and improve and we contribute to the world’s largest collection of open and diverse datasets, going beyond just opening the weights of the models. The open libraries and training scripts give developers the tools to purpose-build AI for their applications, and we publish blueprints and examples to help deploy AI as systems of models.”
The company now has open models specifically for physical AI in Cosmos, robotics, with the open-reasoning vision-language-action (VLA) model Gr00t and its Nemotron models for agentic AI.
Nvidia is making the case that open models across different branches of AI form a shared enterprise ecosystem that feeds data, training, and reasoning to agents in both the digital and physical worlds.
Additions to the Nemotron family
Briski said Nvidia plans to continue expanding its open models, including its Nemotron family, beyond reasoning to include a new RAG and embeddings model to make information more readily available to agents. The company released Nemotron 3, the latest version of its agentic reasoning models, in December.
Nvidia announced three new additions to the Nemotron family: Nemotron Speech, Nemotron RAG and Nemotron Safety.
In a blog post, Nvidia said Nemotron Speech delivers “real-time low-latency speech recognition for live captions and speech AI applications” and is 10 times faster than other speech models.
Nemotron RAG is technically comprised of two models: an embedding model and a rerank model, both of which can understand images to provide more multimodal insights that data agents will tap.
“Nemotron RAG is on top of what we call the MMTab, or the Massive Multilingual Text Embedding Benchmark, with strong multilingual performance while using less computing power memory, so they are a good fit for systems that must handle a lot of requests very quickly and with low delay,” Briski said.
Nemotron Safety detects sensitive data so AI agents do not accidentally unleash personally identifiable data.
