MiniMax open source MiniMax-01 new series model performance is comparable to GPT-4o

Author: LoRA Time: 15 Jan 2025 887

MiniMax announced the open source of its new series of models MiniMax-01 on January 15, 2025. The series includes the basic language large model MiniMax-Text-01 and the visual multi-modal large model MiniMax-VL-01. The MiniMax-01 series has made bold innovations in architecture, implementing a linear attention mechanism on a large scale for the first time, breaking the limitations of the traditional Transformer architecture. Its parameter volume is as high as 456 billion, and a single activation is 45.9 billion. Its comprehensive performance is comparable to top overseas models, and it can efficiently handle contexts up to 4 million tokens. This length is 32 times that of GPT-4o and Claude-3.5-Sonnet. 20 times.

MiniMax believes that 2025 will be a critical year for the rapid development of Agents. Whether it is a single-Agent system or a multi-Agent system, a longer context is needed to support continuous memory and large amounts of communication. The launch of the MiniMax-01 series of models is precisely to meet this demand and take the first step in establishing the basic capabilities of complex Agents.

WeChat screenshot_20250115091926.png

Thanks to architectural innovation, efficiency optimization and integrated cluster training and push design, MiniMax can provide text and multi-modal understanding API services at the lowest price range in the industry. The standard pricing is input token 1 yuan/million tokens and output token 8 yuan/hundred. 10,000 tokens. The MiniMax open platform and overseas version have been launched for developers to experience.

MiniMax-01 series models have been open sourced on GitHub and will be continuously updated. In the industry's mainstream text and multi-modal understanding evaluations, the MiniMax-01 series tied the internationally recognized advanced models GPT-4o-1120 and Claude-3.5-Sonnet-1022 in most tasks. Especially for long text tasks, compared with Google's Gemini model, MiniMax-Text-01 has the slowest performance degradation as the input length increases, which is significantly better than Gemini.

MiniMax's model is extremely efficient when processing long inputs, approaching linear complexity. In its structural design, 7 out of every 8 layers use linear attention based on Lightning Attention, and 1 layer uses traditional SoftMax attention. This is the first time in the industry that the linear attention mechanism has been extended to the commercial model level. MiniMax has comprehensively considered Scaling Law, combination with MoE, structural design, training optimization and inference optimization, and reconstructed the training and inference system, including more Efficient MoE All-to-all communication optimization, longer sequence optimization, and efficient Kernel implementation of linear attention at the inference level.

In most academic tests, the MiniMax-01 series has achieved results comparable to those of the first tier overseas. It is significantly ahead in the long context evaluation set, such as its excellent performance on the 4 million Needle-In-A-Haystack retrieval task. In addition to academic data sets, MiniMax also built an assistant scenario test set based on real data, and MiniMax-Text-01 performed outstandingly in this scenario. In the multi-modal understanding test set, MiniMax-VL-01 is also ahead.

Open source address: https://github.com/MiniMax-AI

FAQ

Who is the AI course suitable for?

AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.

How difficult is the AI course to learn?

The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.

What foundations are needed to learn AI?

Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).

What can I learn from the AI course?

You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.

What kind of work can I do after completing the AI course?

You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.