What are the key points?

NVIDIA released Nemotron 3 Ultra, the most intelligent US open weights model to date. The model features 550 billion total parameters and achieves inference speeds over 400 tokens per second. Nemotron 3 Ultra scored 47.7 on the Artificial Analysis Intelligence Index, outperforming other US-led open models.

NVIDIA Releases Nemotron 3 Ultra Model

•NVIDIA released Nemotron 3 Ultra, the most intelligent US open weights model to date.
•The model features 550 billion total parameters and achieves inference speeds over 400 tokens per second.
•Nemotron 3 Ultra scored 47.7 on the Artificial Analysis Intelligence Index, outperforming other US-led open models.

NVIDIA released its latest model, Nemotron 3 Ultra, on June 4, 2026, marking it as the most intelligent US-based open weights model available. The model architecture features approximately 550 billion total parameters with 55 billion active parameters, making it the largest release in the Nemotron 3 series to date. In performance testing by Artificial Analysis, the model achieved a score of 47.7 on the Artificial Analysis Intelligence Index using NVFP4 weights. This result places it significantly ahead of other US open weights models, such as Gemma 4 31B, which scored 39.2, and its sibling, Nemotron 3 Super, which scored 36.0. While it surpasses the performance of gpt-oss-120b (33.3), it remains behind the Chinese-led Kimi K2.6, which holds a 53.9 score.

A primary technical highlight of Nemotron 3 Ultra is its inference efficiency; when deployed on BlackBox AI, the model achieves speeds exceeding 400 output tokens per second. This speed represents a notable engineering achievement as the model is more than 4X larger than gpt-oss-120b while maintaining faster serving capabilities. During agentic evaluations on Terminal-Bench v2.1, the model demonstrated a strong ability to complete tasks at low latency by optimizing its performance-versus-time trade-off. The testing process involved a sweep across four turn budgets—10, 20, 50, and 100 turns—where Nemotron 3 Ultra consistently occupied the Pareto frontier, indicating an optimal balance between task completion speed and solution accuracy.

Evaluation metrics further distinguish the model's capabilities in specific domains. On the AA-Omniscience Non-Hallucination benchmark, Nemotron 3 Ultra reached a 71% score, demonstrating a reduced propensity to provide incorrect answers when faced with unknown factual queries. Additionally, the model recorded an Elo score of 1378 on GDPval-AA, aligning its performance with DeepSeek V4 Flash. Despite these gains, its performance on CritPt, a benchmark designed for graduate-level physics research, remained at 3%, identical to the score achieved by Nemotron 3 Super. While Nemotron 3 Ultra leads the US open weights market in general intelligence and agentic efficiency, the Gemma 4 31B model currently maintains a slight advantage of approximately 1 point on the combined Coding Index, which includes Terminal-Bench Hard and SciCode metrics.

NVIDIA released its latest model, Nemotron 3 Ultra, on June 4, 2026, marking it as the most intelligent US-based open weights model available. The model architecture features approximately 550 billion total parameters with 55 billion active parameters, making it the largest release in the Nemotron 3 series to date. In performance testing by Artificial Analysis, the model achieved a score of 47.7 on the Artificial Analysis Intelligence Index using NVFP4 weights. This result places it significantly ahead of other US open weights models, such as Gemma 4 31B, which scored 39.2, and its sibling, Nemotron 3 Super, which scored 36.0. While it surpasses the performance of gpt-oss-120b (33.3), it remains behind the Chinese-led Kimi K2.6, which holds a 53.9 score.

A primary technical highlight of Nemotron 3 Ultra is its inference efficiency; when deployed on BlackBox AI, the model achieves speeds exceeding 400 output tokens per second. This speed represents a notable engineering achievement as the model is more than 4X larger than gpt-oss-120b while maintaining faster serving capabilities. During agentic evaluations on Terminal-Bench v2.1, the model demonstrated a strong ability to complete tasks at low latency by optimizing its performance-versus-time trade-off. The testing process involved a sweep across four turn budgets—10, 20, 50, and 100 turns—where Nemotron 3 Ultra consistently occupied the Pareto frontier, indicating an optimal balance between task completion speed and solution accuracy.

Evaluation metrics further distinguish the model's capabilities in specific domains. On the AA-Omniscience Non-Hallucination benchmark, Nemotron 3 Ultra reached a 71% score, demonstrating a reduced propensity to provide incorrect answers when faced with unknown factual queries. Additionally, the model recorded an Elo score of 1378 on GDPval-AA, aligning its performance with DeepSeek V4 Flash. Despite these gains, its performance on CritPt, a benchmark designed for graduate-level physics research, remained at 3%, identical to the score achieved by Nemotron 3 Super. While Nemotron 3 Ultra leads the US open weights market in general intelligence and agentic efficiency, the Gemma 4 31B model currently maintains a slight advantage of approximately 1 point on the combined Coding Index, which includes Terminal-Bench Hard and SciCode metrics.