DeepL launches voice translation for meetings and conversations
DeepL, a company known for its text translation tools, has unveiled a new voice-to-voice translation suite designed for various use cases, including meetings, mobile and web conversations, and group discussions for frontline workers through custom applications. The company is also releasing an API that allows external developers and businesses to build on DeepL's technology for tailored solutions, such as call centers.
DeepL CEO Jarek Kutylowski stated in an interview with TechCrunch that after many years in text translation, voice translation was a natural progression for the company. He emphasized that despite advancements in text and document translation, there was a lack of quality products for real-time voice translation.
Kutylowski noted that the challenges of creating a real-time translation product revolve around minimizing latency—the delay between someone speaking and the translated audio playing back—while ensuring accuracy. DeepL is developing add-ons for platforms like Zoom and Microsoft Teams, where listeners can either hear real-time translation while others speak in their native languages or follow translated text on screen.
The program is currently in early access, and the company invites organizations to join a waitlist. DeepL also offers a product for mobile and web-based conversations that can occur in person or remotely. Users can participate in group conversations, such as training sessions or workshops, by joining through a QR code.
DeepL's voice-to-voice technology can also learn and adapt to custom vocabulary, including industry-specific terms and company and personal names. Kutylowski mentioned that AI is reimagining what customer service will look like in the coming years, noting that a translation layer helps companies provide support in languages where qualified staff are scarce and costly.
The company controls the entire voice-to-voice stack; however, the current system converts speech to text, applies translation, and then converts it back to speech. DeepL believes its extensive experience in text translation gives it an edge in translation quality. Moving forward, the company aims to develop an end-to-end voice translation model that bypasses the text step entirely.
DeepL faces competition from several well-funded startups operating in adjacent areas. For instance, Sanas, which raised $65 million last year from Quadrille Capital and Teleperformance, uses AI to modify a speaker's accent in real time, primarily aimed at call center agents. Dubai-based Camb.AI focuses on speech synthesis and translation for media and entertainment companies, helping them dub and localize video content at scale.
Building Multi-Agent AI Systems with SmolAgents and Dynamic Orchestration
UCSD and Together AI Introduce Parcae: A Stable Architecture for Language Models
Related articles
memweave: A New Approach to Agent Memory with Markdown and SQLite
memweave introduces a new approach to AI agent memory using Markdown and SQLite.
UCSD and Together AI Introduce Parcae: A Stable Architecture for Language Models
UCSD and Together AI introduced Parcae, a new language model architecture with improved efficiency.
Building Multi-Agent AI Systems with SmolAgents and Dynamic Orchestration
Building multi-agent AI systems using SmolAgents and dynamic orchestration.