Together AI Achieves 90% Faster Training with NVIDIA Blackwell

Источник
Together AI Achieves 90% Faster Training with NVIDIA Blackwell

Today we are announcing immediate access to Together GPU Clusters accelerated by the NVIDIA Blackwell platform, along with an accompanying AI acceleration stack optimized for the latest GPU architecture. Together GPU Clusters featuring NVIDIA HGX B200 are turbocharged with Together Kernel Collection to deliver unprecedented performance: 90% faster training than NVIDIA HGX H100, achieving 15,200 tokens/second/node on a training run for a 70B parameter model.

Our research team has achieved these incredible speed-ups by leveraging NVIDIA Blackwell’s advanced features using the open-source ThunderKittens framework. We developed custom FP8 kernels that take full advantage of Blackwell’s 5th-generation NVIDIA Tensor Cores and dedicated on-chip memory to produce attention kernels that run 1.8x faster than FlashAttention-3.

In an exclusive launch program, we're offering eight pioneering AI teams the opportunity to test drive dedicated HGX B200 nodes and collaborate directly with NVIDIA engineers and Together AI researchers to accelerate their AI workloads. This collaboration combines Together AI's kernel optimization expertise with NVIDIA's latest accelerated computing platform innovations, setting new benchmarks for AI training and inference efficiency.

We are deploying tens of thousands of NVIDIA HGX B200 servers and GB200 NVL72 rack-scale solutions with NVIDIA Quantum-2 InfiniBand networking. All Together GPU Clusters feature the highest-performance NVIDIA NVLink within a node and NVIDIA Quantum-2 InfiniBand networking across nodes, providing the scale and performance needed to build and deploy the next generation of AI reasoning models and agents.

Our team is eager to work hand in hand with yours, forging the frontier of AI. Together AI optimizes every layer of the AI stack to fully take advantage of advances in GPU architecture, like NVIDIA Blackwell. We write custom kernels to maximize both speed and scalability, and we’re particularly excited about the new microscaling data format to speed up model inference.

Похожие статьи