Go out and ask to release TicVoice 7.0: Supernatural voice cloning and cross-lingual generation capabilities

Author: LoRA Time: 07 Mar 2025 427

On March 6, Mobvoi joined hands with top academic institutions such as the Hong Kong University of Science and Technology, Shanghai Jiaotong University, Nanyang Technological University, and Northwestern Polytechnical University to open source the new generation of speech generation model Spark-TTS, and launched its commercial high-quality TTS engine - TicVoice7.0. As the seventh generation TTS engine of Going Out, TicVoice7.0 has achieved a major breakthrough in the field of voice generation and opened up a new voice generation paradigm.

The core advantage of TicVoice 7.0 lies in its innovative voice encoding method and modeling structure. The engine uses BiCodec encoding technology to encode speech into two complementary parts: Global Tokens with fixed sequence length and Semantic Tokens with low bitrate. Global Tokens are responsible for modeling timing-independent global features, such as tone, to ensure the global controllability of speech generation; Semantic Tokens uses features extracted by wav2vec2.0 as input to encode information closely related to text, ensuring strong correlation of semantics. This design not only solves the problems existing in traditional speech coding, but also realizes the high unity of speech token modeling and text token modeling, making speech generation more efficient and controllable.

WeChat screenshot_20250307084939.png

Based on this innovation, TicVoice 7.0 demonstrates outstanding voice cloning capabilities and emotional expression. It can keenly capture voiceprint features within 3 seconds, allowing AI to not only "speak human words", but also imitate subtle emotional expressions such as human sighs and pauses. Compared with the previous generation of voice models, TicVoice 7.0 has significantly improved tone similarity, emotional performance and stability. The international general MOS score has been increased from 3.9 to 4.2, with stronger emotional expression and more natural, pleasant and stable listening.

In addition, TicVoice 7.0 also performs well in personalized customization. Users can accurately shape a unique sound style by adjusting various attributes such as gender, speech speed, and basic frequency. In terms of customization of "Zhizhen Pro-Quality Pronunciator", users only need to provide 20-200 corpus to get a professional dubbing experience of broadcasting. The international universal MOS score has been upgraded from 4.3 to 4.7, reaching the broadcasting level, providing professional voice generation solutions for film and television, games and other scenarios.

At present, Gouwuwen has put TicVoice 7.0 in its AI dubbing product "Maoyin Workshop", bringing users better services and experiences. This engine not only performs well in application scenarios such as customer service, audio books, emotional live broadcasts, film and television commentary, but also injects new impetus into the development of the industry through the deep collaboration between open source ecology and industry, academia and research.

Tips & Information

Go out and ask to release TicVoice 7.0: Supernatural voice cloning and cross-lingual generation capabilities

Manus Invitation Code Application Guide

Character.AI launches AvatarFX: AI video generation model allows static images to "open to speak"

Manychat completes US$140 million Series B financing, using AI to accelerate global social e-commerce layout

Google AI Overview Severely Impacts SEO Click-through Rate: Ahrefs Research shows traffic drop by more than 34%