Launch NVIDIA Instant Clusters for AI without delays

Источник
Launch NVIDIA Instant Clusters for AI without delays

Setting up multi-node GPU clusters has traditionally been an arduous process, taking valuable time and effort from AI engineers and researchers. Today, we are excited to announce the launch of Together Instant Clusters, which offer an API-first developer experience. Instant Clusters provide self-service automation for AI infrastructure—from single-node clusters with 8 GPUs to large multi-node clusters with hundreds of interconnected GPUs, supporting both NVIDIA Hopper and NVIDIA Blackwell architectures.

AI-native companies can now swiftly manage sudden demand, whether it’s a training run or increased inference traffic, by adding capacity quickly and bringing a cluster online automatically with the right orchestration (K8s or Slurm) and networking. Instant Clusters can be provisioned in minutes, without lengthy procurement cycles or manual approvals, and are preconfigured for low-latency inference and high-throughput distributed training.

Developers expect cloud solutions to be API-first, self-service, and predictable. Historically, tightly coupled GPU clusters have not felt that way—teams assembled drivers, schedulers, and networking components by hand. Together Instant Clusters make GPU infrastructure feel like the rest of the cloud: automated from request to run, consistent across environments, and designed to scale from a single node to large multi-node clusters without changing how you work.

Clusters come pre-loaded with the components teams usually spend days wiring up themselves. This includes a GPU Operator to manage drivers and runtime software, an Ingress controller to handle traffic, NVIDIA Network Operator for high-performance networking, and Cert Manager for secure certificates. These and other essentials are already in place, so your cluster is production-ready out of the box.

Training at scale demands the right interconnect and orchestration. Clusters are wired with non-blocking NVIDIA Quantum-2 InfiniBand scale-out compute fabric, delivering ultra-low latency and high throughput for multi-node training. Use Kubernetes or Slurm and keep environments reproducible with version-pinned drivers and CUDA. When usage surges, services need to burst—not re-architect. Together Instant Clusters allow for quick inference capacity addition while maintaining latency SLAs.

With the launch of Instant Clusters, we’ve implemented a reliability regimen to ensure clusters are solid before a job starts and remain stable throughout. Every node undergoes testing, and inter-node connections are validated. Clusters are continuously monitored, allowing for rapid issue identification and resolution. Together AI stands out among cloud providers because a significant portion of our team consists of AI researchers who actively use and contribute to the platform.

Похожие статьи