Optimize Cost and Reliability with Gemini API

03.04.2026, 01:42 2 views Source

Google has introduced new service tiers for the Gemini API: Flex and Priority, allowing developers to optimize costs and reliability through a single interface. These innovations will simplify architecture management by enabling the routing of background and interactive tasks using standard synchronous endpoints.

Flex Inference is a new cost-optimized tier designed for latency-tolerant tasks. It offers a 50% savings compared to the standard API by allowing users to downgrade the criticality of their requests, increasing latency. This is ideal for background CRM updates, large-scale research simulations, and agentic workflows.

On the other hand, Priority Inference provides the highest level of reliability for critical applications. This tier ensures that important requests are not preempted even during peak platform usage. If Priority limits are exceeded, overflow requests are automatically served at the standard tier, maintaining application uptime.

Using the new service tiers is straightforward: simply configure the service_tier parameter in your request. Flex is available for all paid tiers, while Priority is accessible to users with Tier 2 and 3 projects. Full pricing information and code examples can be found in the Gemini API documentation.

Amazon Bedrock introduces new capabilities for agent interaction

Optimize Cost and Reliability with Gemini API

Related articles

Amazon Bedrock introduces new capabilities for agent interaction

Anthropic utilizes powerful AI model for cybersecurity

Building intelligent audio search with Amazon Nova Embeddings