![Cloud]()
NVIDIA has unveiled DGX Cloud Serverless Inference, a cutting-edge AI inference platform designed to streamline application deployment across multi-cloud and on-premises environments. Announced at GTC 2025, this new solution enables businesses to scale AI workloads globally while eliminating infrastructure complexities.
Seamless AI Deployment Across Multiple Clouds
DGX Cloud Serverless Inference acts as a horizontal aggregator, abstracting infrastructure setups across AWS, Azure, Google Cloud, private clouds, and on-premises data centers. The platform enables developers to deploy AI applications effortlessly, taking advantage of auto-scaling, global load balancing, and multi-cloud flexibility to manage high-performance AI workloads.
The system allows businesses to mix computing resources from NVIDIA Cloud, NVIDIA Cloud Partners, private clouds, and CSPs, ensuring unmatched flexibility for AI model deployment and management. By reducing the complexity of managing clusters and GPUs, companies can focus entirely on their AI innovations.
Supports a Wide Range of AI and Graphical Workloads
DGX Cloud Serverless Inference supports a diverse set of workloads.
AI Workloads: Large Language Models (LLMs), object detection, text-to-image, and text-to-3D models.
Graphical Workloads: Digital twins, simulations, interactive streaming, and digital human AI.
Batch Processing & Job Workloads: Rendering, AI model fine-tuning, TensorRT engine optimization, and physical AI development.
By leveraging GPU-accelerated processing, the platform ensures faster, more efficient inference for real-time applications. It is ideal for enterprises requiring high-speed AI processing, automation, and large-scale data analysis.
![Workloads]()
Key Benefits for Developers and Enterprises
- No Infrastructure Management: Deploy AI applications effortlessly without handling hardware or complex setups.
- Global Scaling: Easily scale workloads to meet global demands.
- Multi-Cloud and Hybrid Support: Operate across major cloud providers and on-premise environments.
- Cost Optimization: Optimize compute resources for better efficiency and lower expenses.
- Security and Compliance: Support for SOC2, HIPAA, and other enterprise security standards.
Now Available for ISVs and Cloud Partners
DGX Cloud Serverless Inference is now open to Independent Software Vendors (ISVs) and NVIDIA Cloud Partners, providing a cost-effective and scalable way to deploy AI-powered applications. Developers can integrate AI models using NVIDIA NIM microservices, custom containers, or Helm charts, ensuring seamless scaling, load balancing, and deployment.
Organizations can also leverage their own private cloud resources while benefiting from NVIDIA’s AI-optimized compute infrastructure, enabling greater flexibility and control over their AI workloads.
Shaping the Future of AI Deployment
With DGX Cloud Serverless Inference, NVIDIA is setting new standards for AI scalability and cloud computing. This platform removes infrastructure roadblocks, allowing businesses to focus on innovation and AI-driven advancements. By simplifying AI deployment across multi-cloud and hybrid environments, NVIDIA reinforces its position as a leader in enterprise AI solutions.
The introduction of DGX Cloud Serverless Inference represents a major leap forward in AI deployment efficiency, helping businesses deploy, manage, and scale AI workloads faster than ever before.