Yuan2-M32-hf-int4

Yuan2.0-M32 hybrid expert model attention router

Yuan2.0-M32 is a 32-expert MoE language model with 2 active experts offering efficient routing and 200B token training for superior performance in coding math and professional domains with low computation needs

Go to website

Author:LoRA

Inclusion Time:14 Mar 2025

Visits:7496

Pricing Model:Free

Introduction

Yuan2.0-M32 is a hybrid expert (MoE) language model with 32 experts, 2 of which are active. A new routing network, attention router, was introduced to improve the efficiency of expert selection, making the model more accurate than the model using traditional router networks. Yuan2.0-M32 was trained from scratch and used 200 billion tokens. The training calculation amount was only 9.25% of the calculation amount required for a scale-intensive model of the same parameter. It has shown competitiveness in coding, mathematics and various professional fields, Yuan2.0-M32 has only 370 million active parameters out of the total parameter 4 billion, and the forward calculation volume of each token is 7.4 GFLOPS, which is only 1/19 of the Llama3-70B requirement. Yuan2.0-M32 surpassed Llama3-70B in MATH and ARC-Challenge benchmarks, with accuracy rates of 55.9% and 95.8% respectively.

Demand population:

"The Yuan2.0-M32 model is suitable for developers and researchers who need to handle large amounts of data and complex computing tasks, especially in applications of programming, mathematical computing and expertise. Its high efficiency and low computing requirements make it ideal for large-scale language model applications."

Example of usage scenarios:

In the field of programming, Yuan2.0-M32 can be used for code generation and code quality evaluation.

In the field of mathematics, models can perform complex mathematical problems and logical reasoning.

In professional fields such as medical or legal, Yuan2.0-M32 can assist professionals in knowledge retrieval and document analysis.

Product Features:

Hybrid Expert (MoE) model with 32 experts, 2 of which are active.

Use attention routers for more efficient expert choices.

Training from scratch, using 200 billion tokens.

The training calculation volume accounts for only 9.25% of the same parameter scale model.

Show competitiveness in coding, mathematics and professional fields.

With low forward computing requirements, only 7.4 GFLOPS per token is required.

Excellent in MATH and ARC-Challenge benchmarks.

Tutorials for use:

1. Configure the environment and use the recommended docker image to start the Yuan2.0 container.

2. Perform data preprocessing according to the document description.

3. Use the provided scripts to pre-train the model.

4. Refer to the detailed deployment plan of vllm to deploy the inference service.

5. Visit the GitHub repository to get more information and documentation.

6. Comply with the Apache 2.0 open source license agreement, understand and comply with the 'Yuan2.0 Model License Agreement'.

Alternative of Yuan2-M32-hf-int4

Memary

Memary enhances AI agents with human-like memory for better learning and reasoning, using Neo4j and advanced models for knowledge management.

Memary open source memory layer autonomous agent memory
ChatPuma

ChatPuma offers intuitive AI chatbot solutions for businesses to enhance customer interactions and boost sales effortlessly.

AI customer service
gpt-engineer

gpt-engineer offers AI-driven assistance for seamless website creation and development providing powerful tools for an efficient workflow.

GPT AI
App Mint

App Mint offers intuitive AI-powered tools for designing and building exceptional mobile apps effortlessly achieving your goals.

AI text generation

Selected columns

Second Me Tutorial

Welcome to the Second Me Creation Experience Page! This tutorial will help you quickly create and optimize your second digital identity.
Cursor ai tutorial

Cursor is a powerful AI programming editor that integrates intelligent completion, code interpretation and debugging functions. This article explains the core functions and usage methods of Cursor in detail.
Grok Tutorial

Grok is an AI programming assistant. This article introduces the functions, usage methods and practical skills of Grok to help you improve programming efficiency.
Dia browser usage tutorial

Learn how to use Dia browser and explore its smart search, automation capabilities and multitasking integration to make your online experience more efficient.
ComfyUI Tutorial

ComfyUI is an efficient UI development framework. This tutorial details the features, components and practical tips of ComfyUI.