CosyVoice 2.0

Multilingual speech synthesis AI speech generation low-latency TTS model

CosyVoice 2.0 is a leading multilingual voice generation model, using streaming modeling technology to achieve ultra-low latency (150ms), and the sound quality is natural and stable.

Go to website

Author:LoRA

Inclusion Time:11 Mar 2025

Downloads:931

Pricing Model:Free

Introduction

CosyVoice2.0 is a multilingual, large-scale speech generation model with complete full-stack capabilities, covering reasoning, training and deployment, and is of great value in the field of speech synthesis. It not only supports multilingual voice generation, but also generates natural and smooth voices that are close to human voices, which are suitable for multiple locales.

The project was developed by the FunAudioLLM team and is open sourced under the Apache-2.0 license.

Main features

Multilingual support: CosyVoice supports pronunciation synthesis in Chinese, English, Japanese, Korean and a variety of Chinese dialects (such as Cantonese, Sichuan, Shanghai, Tianjin, Wuhan dialect, etc.).

Ultra-low latency: CosyVoice 2.0 integrates offline and streaming modeling technology and supports bidirectional streaming voice synthesis, with first-pack synthesis latency as low as 150 milliseconds while maintaining high-quality audio output.

High Accuracy: CosyVoice 2.0 reduces pronunciation errors in synthetic audio by 30% to 50% compared to version 1.0, achieving the lowest character error rate on the difficult test set of the Seed-TTS evaluation set.

Strong stability: CosyVoice 2.0 ensures excellent timbre consistency in zero-sample and cross-language speech synthesis.

Natural experience: The rhythm, sound quality and emotional alignment of synthetic audio have been significantly improved, with the MOS evaluation score increased from 5.4 to 5.53.

CosyVoice 2.0 local deployment detailed tutorial

This tutorial will guide you on-premises CosyVoice 2.0 , from environment configuration to model runs, for Windows users.

1. Download and install Miniconda

Miniconda is a Conda management tool that is very convenient to install on Windows. After downloading, click Next like normal software until the installation is completed.

2. Download the CosyVoice source code

Get the CosyVoice source code from the official repository or specified channel and unzip it.

3. Create a virtual environment and activate it

Open Anaconda Prompt or CMD and enter the following command to create and activate the environment:

 conda create -n cosyvoice python=3.8 -y
conda activated cosyvoice

4. Install the pynini module

The pynini module can only be installed using Conda under Windows, so it runs in an activated environment:

 conda install -y -c conda-forge pynini==2.1.5 WeTextProcessing==1.0.3

5. Install other dependencies (using Alibaba mirror)

Edit requirements.txt

Delete WeTextProcessing==1.0.3 of the last line (avoid installation failure)
Adding Matcha-TTS dependencies

Installation dependencies (using Alibaba Cloud Mirror Acceleration):

 pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/

6. Complete deployment

At this point, CosyVoice and all its dependencies have been installed and can be started.

Guess you like

Amazon Nova Premier

Amazon Nova Premier is Amazon's new multi-modal language model that supports the understanding and generation of text, images, and videos, helping developers build AI applications.

Generate text images
Qwen2.5-14B-Instruct-GGUF

Qwen2.5-14B-Instruct-GGUF is an optimized large-scale language generation model that combines advanced technology and powerful instruction tuning with efficient text generation and understanding capabilities.

Text generation chat
Skywork 4.0

Tiangong Model 4.0 is online, with dual upgrades of reasoning and voice assistant. It is free and open, bringing a new AI experience!

multimodal model
Gemini 2.5 Pro

Gemini 2.5 Pro is a new generation of AI model launched by Google. It has "thinking ability" and conducts multiple steps of reasoning before responding, thereby greatly improving performance and accuracy.

AI inference model Google artificial intelligence

Selected columns

Second Me Tutorial

Welcome to the Second Me Creation Experience Page! This tutorial will help you quickly create and optimize your second digital identity.
Cursor ai tutorial

Cursor is a powerful AI programming editor that integrates intelligent completion, code interpretation and debugging functions. This article explains the core functions and usage methods of Cursor in detail.
Grok Tutorial

Grok is an AI programming assistant. This article introduces the functions, usage methods and practical skills of Grok to help you improve programming efficiency.
Dia browser usage tutorial

Learn how to use Dia browser and explore its smart search, automation capabilities and multitasking integration to make your online experience more efficient.
ComfyUI Tutorial

ComfyUI is an efficient UI development framework. This tutorial details the features, components and practical tips of ComfyUI.