Achieving Single-Digit Microsecond Latency for Financial Markets

In algorithmic trading, reducing response times to market events is crucial. To keep pace with high-speed electronic markets, latency-sensitive firms often use specialized hardware like FPGAs and ASICs. Yet, as markets grow more efficient, traders increasingly depend on advanced models such as deep neural networks to enhance profitability. Because implementing these complex models on low-level hardware requires significant investment, general-purpose GPUs offer a practical, cost-effective alternative.

The NVIDIA GH200 Grace Hopper Superchip in the Supermicro ARS-111GL-NHR server has achieved single-digit microsecond latencies in the STAC-ML Markets (Inference) benchmark, providing performance comparable to or better than specialized hardware systems. This article details these record-breaking results and provides insights into the custom-tailored solutions required for low-latency GPU inference.

Deep neural networks with long short-term memory (LSTM) are widely used for time series forecasting in capital markets. The STAC-ML (Markets) Inference benchmark measures LSTM model latency—the time between receiving new input and generating the output. It includes three models of increasing complexity, where LSTM_B is about six times greater than LSTM_A, and LSTM_C is roughly 200 times greater than LSTM_A.

STAC-ML has emerged as a crucial benchmark for financial institutions leveraging machine learning (ML) in trading. It rigorously measures the speed and reliability of a technology stack when running models on live market data under realistic, production-like conditions. By standardizing key metrics—such as latency, throughput, and efficiency for LSTM and other time series models—STAC-ML enables banks, hedge funds, and market makers to conduct objective comparisons of competing hardware and software solutions prior to deployment.

STAC-ML results are essential for trading desks situated in co-located data centers, where winning or losing an order can be decided in microseconds. They validate that a platform can meet strict latency budgets for demanding use cases like high-frequency market making, short-term price prediction, and automated hedging. Furthermore, because the benchmark is designed and governed by practitioners from leading financial firms, its scores carry significant weight in the technology selection process.

Achieving Single-Digit Microsecond Latency for Financial Markets

Related articles

OpenAI Enhances Governance with New Agents SDK

Introduction to Deep Evidential Regression for Uncertainty Quantification

Meta Researchers Introduce Hyperagents for Self-Improving AI