NVIDIA Nemotron 3 Ultra Launches on Amazon SageMaker JumpStart
- •AWS launched NVIDIA Nemotron 3 Ultra on Amazon SageMaker JumpStart with one-click deployment support.
- •The 550B parameter model uses hybrid Transformer-Mamba MoE architecture to deliver 5x faster inference performance.
- •Designed for agentic AI, the model offers a 1M token context window and 30% lower operating costs.
Amazon Web Services (AWS) announced on June 4, 2026, the availability of the NVIDIA Nemotron 3 Ultra model on Amazon SageMaker JumpStart. Users can now deploy this open model via a one-click interface within the SageMaker environment, allowing for the development of autonomous agents that require sustained multi-step reasoning. The model supports a context length of up to 1M tokens, enabling it to manage long-running planning and tool-calling sequences.
Nemotron 3 Ultra features 550 billion total parameters, with 55 billion active parameters per forward pass. Its architecture combines Transformer-Mamba MoE (Mixture-of-Experts) designs, optimized for the NVFP4 format. This configuration is engineered to increase inference speeds by 5x and reduce operational costs by up to 30% for agentic tasks compared to dense models. The model is specifically aimed at enterprise use cases, including coding agents, deep research synthesis, and multi-step business workflow orchestration.
Deployment requires an AWS account and sufficient service quotas for specific GPU instances, such as ml.p5en.48xlarge, ml.p5.48xlarge, or ml.g7e.48xlarge. Amazon notes that these instances incur hourly charges while the SageMaker endpoint remains active. Users can initiate deployment through the SageMaker Studio console or programmatically via the SageMaker Python SDK by referencing the model ID 'huggingface-reasoning-nvidia-nemotron-3-ultra-550b-a55b-nvfp4'. The company emphasizes that users should delete their endpoints upon completion of tasks to prevent ongoing costs.