Current location: Home> Ai News

ElevenLabs releases Scribe: Voice to text accuracy hits new high

Author: LoRA Time: 27 Feb 2025 617

ElevenLabs is a highly anticipated AI voice cloning and generation startup that recently launched its latest voice-to-text model - Scribe v1. The model claims to achieve the highest accuracy in multiple languages ​​and users can experience it through their official website.

QQ_1740621264139.png

According to ElevenLabs benchmarks, Scribe surpassed Google's Gemini2.0Flash, OpenAI's Whisper v3 and Deepgram Nova-3 to achieve unprecedented low error rates in converting spoken language into text. The company said Scribe supports high-precision transcription in 99 languages, including languages ​​that have been overlooked before, such as Serbian, Cantonese and Malayalam.

Flavio Schneider, principal researcher at ElevenLabs, said on social platform X that Scribe is the "cleverest audio understanding model" the company has released so far. He stressed that Scribe is not just a transcription tool, it can understand audio content, detect nonverbal events (such as laughter, sound effects, music and background noise), and analyze long-term audio content in complex environments for accurate speaker distinction. It is worth mentioning that Scribe is able to identify and isolate up to 32 different speakers in the same audio file.

QQ_1740621326377.png

ElevenLabs reminds users that Scribe is "best suitable for occasions where high-precision transcription is required, rather than real-time transcription." The company also plans to launch a low-latency version to expand its use in real-time applications.

According to benchmark results from FLEURS and Common Voice, Scribe has performed well in dealing with real-world audio challenges, especially in terms of word error rates in Italian (98.7% accuracy) and English (96.7% accuracy).

Scribe is now available through the ElevenLabs official website and API, priced at $0.40 per hour for input audio, and will enjoy a 50% discount in the next six weeks. In addition, low-latency versions for real-time applications are also under development.

For enterprise decision makers, Scribe provides a scalable tool for high-precision transcription for industries that require automated documentation, conference transcription, and content accessibility. The model's high-precision processing of multiple languages ​​will also benefit multinational corporations, media companies and customer support applications.

It is worth noting that Scribe's release was on the same day as the release of its text-to-speech model Octave, a competitor Hume. Octave is a text-to-speech tool based on large language models, where users can customize AI-generated sounds based on emotional needs, designed for content creation, such as audiobooks, podcasts, and video game dubbing. Although Scribe and Octave have different capabilities, the releases of the two reflect the increasingly fierce competition in AI-driven audio models.

Product portal: https://elevenlabs.io/blog/meet-scribe