Gemini 3.1 Flash TTS enhances AI speech quality and control

Source
Gemini 3.1 Flash TTS enhances AI speech quality and control

Introducing Gemini 3.1 Flash TTS, the next generation text-to-speech model that offers improved sound quality and control. This model allows the use of audio tags to adjust vocal style and pacing in over 70 languages, making AI speech more expressive and natural.

Gemini 3.1 Flash TTS provides high levels of controllability and expressiveness, empowering developers and users to create innovative AI speech applications. The model is now available for developers through the Gemini API and Google AI Studio, as well as for enterprises on the Vertex AI platform.

The speech quality of Gemini 3.1 Flash TTS has significantly improved, as evidenced by its high Elo rating on the Artificial Analysis TTS leaderboard, where it ranked in the “most attractive quadrant” for its blend of high-quality speech generation and low cost. The model also supports multi-speaker dialogue and offers granular control over expressiveness through natural language commands.

With new audio tags, developers can manage vocal style, pace, and delivery. By embedding commands directly into the text, they can precisely steer AI speech output, unlocking new possibilities for creating memorable characters and immersive audio experiences.

Gemini 3.1 Flash TTS also delivers high fidelity and precise control on a global scale, allowing for the creation of localized and expressive speech experiences for users worldwide. Early testers have already noted the impressive controllability and expressiveness of the new model, highlighting how audio tags provide a new level of creative precision.

All audio generated by Gemini 3.1 Flash TTS is watermarked with SynthID, enabling reliable detection of AI-generated content to help prevent misinformation.

Related articles