DeepGEMM open source release: FP8 GEMM library helps AI training and reasoning

Author: LoRA Time: 26 Feb 2025 474

On the third day of its "Open Source Week", Chinese artificial intelligence company DeepSeek announced the launch of DeepGEMM - an open source library that supports FP8 universal matrix multiplication (GEMM). Designed for intensive and hybrid expert (MoE) matrix operations, this tool provides powerful support for training and inference for DeepSeek V3 and R1 models. The official news was released through the X platform, which quickly aroused enthusiastic response from the technology community.

According to a post published by DeepSeek's official account, DeepGEMM can achieve FP8 computing performance up to 1350+ TFLOPS on NVIDIA Hopper GPU. Its core logic is only about 300 lines of code, but it can surpass the expert-tuned kernel in most matrix sizes, showing extremely high efficiency and simplicity. This library does not require complex dependencies, and adopts Just-In-Time technology, supports intensive layout and two MoE layouts. It is designed to be "clean like a tutorial" and is easy for developers to learn and use.

X user @TechBitDaily commented: "The launch of DeepGEMM is a highlight of DeepSeek's open source week, with impressive FP8 performance and simplicity design." Another user @AIObserverCN pointed out that the library supports efficient training of MoE models. It has significant advantages and may promote further innovation in the AI community in the Hopper architecture.

As part of the Open Source Week, the launch of DeepGEMM continues DeepSeek's commitment to promote transparency in AI technology and community collaboration. Previously, the company had released FlashMLA and DeepEP tools in the first two days, focusing on fast language model architecture and expert parallel communications respectively. The debut of DeepGEMM further demonstrates its technical strength in AI infrastructure construction. Industry insiders believe that this library will not only improve the performance of DeepSeek's own model, but also provide global developers with an efficient and easy-to-use matrix computing tool, with promising future application prospects. Users can now obtain DeepGEMM through GitHub to explore its potential in AI training and reasoning.

Project address: https://github.com/deepseek-ai/DeepGEMM

Tips & Information

DeepGEMM open source release: FP8 GEMM library helps AI training and reasoning

Tesla announces launch of universal AI fully autonomous driving solution

Hugging Face acquires Pollen Robotics to enter the field of open source robot hardware

GPT-4.1 model unveiled! Cursor and Windsurf help developers encode more efficiently

OpenAI future model access will require authentication: Improve security and compliance