Optimize Cost and Reliability with Gemini API
Google has introduced new service tiers for the Gemini API: Flex and Priority, allowing developers to optimize costs and reliability through a single interface. These innovations will simplify architecture management by enabling the routing of background and interactive tasks using standard synchronous endpoints.
Flex Inference is a new cost-optimized tier designed for latency-tolerant tasks. It offers a 50% savings compared to the standard API by allowing users to downgrade the criticality of their requests, increasing latency. This is ideal for background CRM updates, large-scale research simulations, and agentic workflows.
On the other hand, Priority Inference provides the highest level of reliability for critical applications. This tier ensures that important requests are not preempted even during peak platform usage. If Priority limits are exceeded, overflow requests are automatically served at the standard tier, maintaining application uptime.
Using the new service tiers is straightforward: simply configure the service_tier parameter in your request. Flex is available for all paid tiers, while Priority is accessible to users with Tier 2 and 3 projects. Full pricing information and code examples can be found in the Gemini API documentation.
Evaluate Multi-Turn AI Agents with ActorSimulator
Overcome the 'Token Tax' with Google Gemma 4 and NVIDIA
Related articles
Amazon Bedrock introduces new capabilities for agent interaction
Amazon Bedrock introduces new capabilities for interactive agents on the AgentCore platform.
Anthropic utilizes powerful AI model for cybersecurity
Anthropic utilizes its powerful AI model to detect cyber threats.
Building intelligent audio search with Amazon Nova Embeddings
Amazon Nova Embeddings transforms audio content into intelligent search data.