Accelerate inference with torch.compile caching
We now cache torch.compile artifacts to reduce boot times for models that use PyTorch. Models like black-forest-labs/flux-kontext-dev, prunaai/flux-schnell, and prunaai/flux.1-dev-lora now start 2-3x faster. We’ve published a guide to improving model performance with torch.compile that covers more of the details.
What is torch.compile? Many models, particularly those in the FLUX family, apply various torch.compile techniques and tricks to improve inference speed. The first call to a compiled function traces and compiles the code, which adds overhead. Subsequent calls run the optimized code and are significantly faster.
In our tests of inference speed with black-forest-labs/flux-kontext-dev, the compiled version runs over 30% faster than the uncompiled one. Performance improvements are achieved by caching the compiled artifacts across model container lifecycles, leading to dramatic reductions in cold boot times: black-forest-labs/flux-kontext-dev: ~120s → ~60s (50% faster), prunaai/flux-schnell: ~150s → ~70s (53% faster), prunaai/flux.1-dev-lora: ~400s → ~150s (62% faster).
The cache also improves the time from container startup to first prediction success across all models using torch.compile. How does it work? The caching system operates like many CI/CD cache systems: when a model container starts, it looks for cached compiled artifacts. If found, Torch reuses them instead of recompiling from scratch. When containers shut down gracefully, they update the cache if needed. Cache files are keyed on model version and stored close to GPU nodes.
To learn more about how to use torch.compile, check out our own documentation and the official PyTorch torch.compile tutorial.
Create Music with Lyria 3, Our Newest Generation Model
Explore the capabilities of Seedream 5.0 for image creation
Похожие статьи
Optimize UC Berkeley's Machine Learning Course for the AI Age
UC Berkeley updates its machine learning course to help students adapt to changes in the tech industry.
Together AI Enhances Fine-Tuning Service with Tool Support
Together AI expands its fine-tuning service by adding support for tool calls and reasoning.
Introducing DSGym: A New Framework for Evaluating Data Science Agents
DSGym is a new framework for evaluating and training data science agents, offering standardized solutions.