Enhance Batch Inference API: New UI and Model Support

02.04.2026, 05:05 1 просмотров Источник

We have rolled out significant improvements to our Batch Inference API, making it simpler, faster, and more powerful for teams processing massive datasets. Now, creating and tracking batch jobs is possible through an intuitive interface — no complex API calls required.

The Batch Inference API now supports all serverless models and private deployments, allowing you to run batch workloads on exactly the models you need. Rate limits have increased from 10 million to 30 billion tokens per model per user, marking a 3000× increase. Need more? We will work with you to customize it.

For most serverless models, the Batch Inference API runs at half the cost of our real-time API, making it the most economical way to process high-throughput workloads. We rely on the Batch Inference API to handle very large amounts of requests. The high rate limits — up to 30 billion tokens — enable us to conduct massive experiments without bottlenecks, and jobs consistently finish well under the 24-hour SLA, often within just hours.

The Batch Inference API is ideal when you need high throughput without real-time constraints. It is suitable for text analysis, fraud detection, synthetic data generation, embedding generation, and customer support automation. These updates mark a significant step forward in making large-scale inference both accessible and cost-effective. With an upgraded UI, universal model support, and dramatically higher rate limits — all at typically half the cost of real-time APIs — the Batch Inference API is the most efficient way to handle massive workloads.

Try the Batch Inference API today and start scaling your experiments without limits.

Enhance Batch Inference API: New UI and Model Support

Похожие статьи

Maximize AI Infrastructure Throughput with GPU Workload Consolidation

Accelerate Token Production in AI Factories with NVIDIA Mission Control

NVIDIA Sets New MLPerf Records with Co-Designed Solutions