Current location: Home> Ai News

DeepSeek releases 3FS file system, setting a record of 6.6TiB/s throughput

Author: LoRA Time: 28 Feb 2025 376

DeepSeek, a leading AI company in China, dropped a technical "nuclear bomb" at the end of the Open Source Week and officially released the high-performance parallel file system 3FS (Fire-Flyer File System) and the supporting data processing framework Smallpond, designed specifically for modern computing scenarios. This combination of punches hits the pain points of data processing in AI training and reasoning, setting a new industry record with a cluster throughput of 6.6TiB/s, marking a new era of distributed storage technology.

QQ20250228-092812.png

Performance disruption: Architectural innovation defines new standards

3FS realizes 6.6TiB/s aggregation read throughput in a 180-node cluster through decentralized architecture and strongly consistent semantic design, and the peak search value of single-node KVCache exceeds 40GiB/s. Its GraySort benchmark performance reaches 3.66TiB/min (25 nodes), an exponential improvement over traditional solutions. The system deeply optimizes the characteristics of SSD and RDMA networks, pushes hardware bandwidth utilization to the extreme, and provides stable data supply for kilocard-level AI training clusters.

Scenario reconstruction: Full-link empowers AI workflow

As the core infrastructure of the DeepSeek V3/R1 version, 3FS has fully penetrated key links such as data preprocessing, checkpoint storage, vector search and inference caching. Its shared storage layer design significantly simplifies the complexity of distributed development, while strong consistency guarantees ensure the security of large-scale concurrent operations. The smallpond framework with open source has built lightweight PeB-level data processing capabilities, relying on DuckDB to realize "service-free" data engineering, forming a complete ecological closed loop from storage to computing.

Open Source Strategy: Accelerate the Democratic Process of AI Infrastructure

The dual open source of 3FS and Smallpond continues the technological opening rhythm of DeepSeek's "five-day continuous release". By making systems that have been proven by its own AI business to the public, DeepSeek is pushing the industry to break through the storage bottlenecks of data-intensive applications. Analysts believe that this solution may cause a dimensionality reduction blow to traditional distributed systems such as Ceph and Lustre, especially in scenarios such as large-scale model training.

Open source address:

3FS → https://github.com/deepseek-ai/3FS

Data processing framework on Smallpond -3FS→: https://github.com/deepseek-ai/smallpond