Current location: Home> AI Model> Natural Language Processing
CosyVoice 2.0

CosyVoice 2.0

CosyVoice 2.0 is a leading multilingual voice generation model, using streaming modeling technology to achieve ultra-low latency (150ms), and the sound quality is natural and stable.
Author:LoRA
Inclusion Time:11 Mar 2025
Downloads:931
Pricing Model:Free
Introduction

CosyVoice2.0 is a multilingual, large-scale speech generation model with complete full-stack capabilities, covering reasoning, training and deployment, and is of great value in the field of speech synthesis. It not only supports multilingual voice generation, but also generates natural and smooth voices that are close to human voices, which are suitable for multiple locales.

The project was developed by the FunAudioLLM team and is open sourced under the Apache-2.0 license.

Main features

Multilingual support: CosyVoice supports pronunciation synthesis in Chinese, English, Japanese, Korean and a variety of Chinese dialects (such as Cantonese, Sichuan, Shanghai, Tianjin, Wuhan dialect, etc.).

Ultra-low latency: CosyVoice 2.0 integrates offline and streaming modeling technology and supports bidirectional streaming voice synthesis, with first-pack synthesis latency as low as 150 milliseconds while maintaining high-quality audio output.

High Accuracy: CosyVoice 2.0 reduces pronunciation errors in synthetic audio by 30% to 50% compared to version 1.0, achieving the lowest character error rate on the difficult test set of the Seed-TTS evaluation set.

Strong stability: CosyVoice 2.0 ensures excellent timbre consistency in zero-sample and cross-language speech synthesis.

Natural experience: The rhythm, sound quality and emotional alignment of synthetic audio have been significantly improved, with the MOS evaluation score increased from 5.4 to 5.53.

CosyVoice 2.0 local deployment detailed tutorial

This tutorial will guide you on-premises CosyVoice 2.0 , from environment configuration to model runs, for Windows users.

1. Download and install Miniconda

Miniconda is a Conda management tool that is very convenient to install on Windows. After downloading, click Next like normal software until the installation is completed.

2. Download the CosyVoice source code

Get the CosyVoice source code from the official repository or specified channel and unzip it.

3. Create a virtual environment and activate it

Open Anaconda Prompt or CMD and enter the following command to create and activate the environment:

 conda create -n cosyvoice python=3.8 -y
conda activated cosyvoice

4. Install the pynini module

The pynini module can only be installed using Conda under Windows, so it runs in an activated environment:

 conda install -y -c conda-forge pynini==2.1.5 WeTextProcessing==1.0.3

5. Install other dependencies (using Alibaba mirror)

  • Edit requirements.txt

    • Delete WeTextProcessing==1.0.3 of the last line (avoid installation failure)

    • Adding Matcha-TTS dependencies

  • Installation dependencies (using Alibaba Cloud Mirror Acceleration):

 pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/

6. Complete deployment

At this point, CosyVoice and all its dependencies have been installed and can be started.

Guess you like
  • Amazon Nova Premier

    Amazon Nova Premier

    Amazon Nova Premier is Amazon's new multi-modal language model that supports the understanding and generation of text, images, and videos, helping developers build AI applications.
    Generate text images
  • Qwen2.5-14B-Instruct-GGUF

    Qwen2.5-14B-Instruct-GGUF

    Qwen2.5-14B-Instruct-GGUF is an optimized large-scale language generation model that combines advanced technology and powerful instruction tuning with efficient text generation and understanding capabilities.
    Text generation chat
  • Skywork 4.0

    Skywork 4.0

    Tiangong Model 4.0 is online, with dual upgrades of reasoning and voice assistant. It is free and open, bringing a new AI experience!
    multimodal model
  • Gemini 2.5 Pro

    Gemini 2.5 Pro

    Gemini 2.5 Pro is a new generation of AI model launched by Google. It has "thinking ability" and conducts multiple steps of reasoning before responding, thereby greatly improving performance and accuracy.
    AI inference model Google artificial intelligence
Selected columns
  • Second Me Tutorial

    Second Me Tutorial

    Welcome to the Second Me Creation Experience Page! This tutorial will help you quickly create and optimize your second digital identity.
  • Cursor ai tutorial

    Cursor ai tutorial

    Cursor is a powerful AI programming editor that integrates intelligent completion, code interpretation and debugging functions. This article explains the core functions and usage methods of Cursor in detail.
  • Grok Tutorial

    Grok Tutorial

    Grok is an AI programming assistant. This article introduces the functions, usage methods and practical skills of Grok to help you improve programming efficiency.
  • Dia browser usage tutorial

    Dia browser usage tutorial

    Learn how to use Dia browser and explore its smart search, automation capabilities and multitasking integration to make your online experience more efficient.
  • ComfyUI Tutorial

    ComfyUI Tutorial

    ComfyUI is an efficient UI development framework. This tutorial details the features, components and practical tips of ComfyUI.