"B Station Releases IndexTTS: Leading Chinese Text-to-Speech Model"

Author: LoRA Time: 27 Feb 2025 427

IndexTTS, a GPT-style text-to-speech (TTS) model based on XTTS and Tortoise, has been officially released. When processing Chinese text, the system has unique pinyin correction of Chinese characters pronunciation and can accurately control pauses at any position through punctuation marks. This innovative technology makes the text-to-speech effect more natural and smooth, and has attracted widespread attention.

The IndexTTS system has achieved industry-leading performance after tens of thousands of hours of data training, surpassing the current popular TTS systems, including XTTS, CosyVoice2, Fish-Speech and F5-TTS. Multiple modules of the system have been enhanced, especially in-depth improvements in speaker condition feature representation and audio quality optimization. By introducing hybrid modeling, IndexTTS can quickly correct misread Chinese characters, improving the user experience.

The model adopts the latest conditional encoder and BigVGAN2-based voice decoder, which not only improves the stability of training, but also enhances the similarity and sound quality of sound. The team said they have submitted relevant papers on arXiv and plans to release model parameters and code in the next few weeks. In addition, IndexTTS also provides a variety of test sets, including multisyllable vocabulary and subjective and objective review sets for in-depth analysis by researchers.

IndexTTS performed well in multiple reviews, especially in terms of word error rate (WER) and speaker similarity (SS), which outperformed many peer models. For example, in Mandarin tests, IndexTTS' word error rate was only 1.3%, which is much lower than other models' performance, showing its strong accuracy and stability. At the same time, in the sound quality evaluation, IndexTTS' MOS score also reached 4.01, showing its excellent sound quality and sound.

With the continuous advancement of technology and the expansion of application scenarios, the release of IndexTTS marks the advancement of text-to-speech technology to a higher level. For more information about the system, users can contact the relevant team for detailed user experience and technical support.

Project: https://github.com/index-tts/index-tts

Tips & Information

"B Station Releases IndexTTS: Leading Chinese Text-to-Speech Model"

Tesla announces launch of universal AI fully autonomous driving solution

Hugging Face acquires Pollen Robotics to enter the field of open source robot hardware

GPT-4.1 model unveiled! Cursor and Windsurf help developers encode more efficiently

OpenAI future model access will require authentication: Improve security and compliance