Today, DeepSeek, a leading domestic artificial intelligence company, officially announced the fourth-day achievement of its open source plan - Optimized Parallelism Strategies, focusing on the launch of the two-way pipeline parallel algorithm DualPipe, the expert parallel load balancer EPLB, and the deep optimization of the computing-communication overlap mechanism. This technology upgrade directly hits the core pain points of large-scale language model training and provides a new solution for the efficient operation of super wanka-level clusters.
1. DualPipe: Bidirectional Pipe Parallel Algorithm
As one of the core technologies of this upgrade, DualPipe is specially designed for the V3/R1 architecture, and through innovative two-way data flow pipelines, it achieves a high overlap between computing and communication. Compared with traditional one-way pipelines, this technology can significantly improve computing throughput, especially for model training with a scale of 100 billion to 100 billion parameters. The GitHub code base shows that DualPipe performs forward computing synchronously during the backpropagation stage through an intelligent scheduling mechanism, increasing hardware utilization by about 30%.
(Project link: https://github.com/deepseek-ai/DualPipe).
2. EPLB: Dynamic Load Balancer
In response to the "hot-spot experts" problem in hybrid expert (MoE) model training, EPLB technology has realized dynamic load balancing in parallel for the first time. Traditional methods often cause partial computing cards to be overloaded due to uneven allocation of expert tasks. Through real-time monitoring and adaptive allocation, EPLB increases the overall utilization rate of the Wanka-level cluster to more than 92%, effectively avoiding idle resources (project link: https://github.com/deepseek-ai/EPLB).
3. Computation-Communication Overlapping Optimization
Based on the communication overlap analysis tool based on the V3/R1 architecture, DeepSeek has built a spatio-temporal efficiency model of 3D parallelism (data/pipeline/tensor parallelism) for the first time. Through open source analytical datasets (link: https://github.com/deepseek-ai/profile-data), developers can accurately locate conflicting nodes between computing and communication, providing a tuning benchmark for hyperscale model training, and according to tests, it can reduce end-to-end training time by about 15%.
Industry impact: Cracking the bottleneck of big model training
This technology release has attracted strong attention from the industry. Experts pointed out that the combined innovation of DualPipe and EPLB directly responds to the two major challenges of current large-scale training: First, with the exponential growth of model scale, the scalability bottleneck of traditional parallel strategies is becoming increasingly prominent; Second, the popularity of hybrid expert models has made dynamic load balancing a urgent need. The technical director of a cloud computing manufacturer commented: "These tools will significantly reduce the hardware threshold for model training of hundreds of billions of dollars, and are expected to reduce the training cost by 20%-30%.
DeepSeek CTO emphasized in the technical document that the open source strategy has been verified in its internal training of several 100 billion parameter models and will continue to iterate and optimize in the future. Currently, all three technologies are open source on GitHub, supporting developers to customize and apply them to different hardware environments.
As the global AI competition enters the "scaling victory" stage, DeepSeek has opened source of key technologies for four consecutive days, not only demonstrated the technical strength of Chinese AI companies, but also provided the industry with reusable infrastructure. This technological innovation driven by "open collaboration" may reshape the industrial ecosystem of big model training.