Current location: Home> Ai News

DeepSeek launches NSA mechanism to refresh the efficiency of long text AI processing

Author: LoRA Time: 19 Feb 2025 1068

173994364255722_P31518235.jpg

After Musk released the Grok 3, the DeepSeek team quickly launched an important study on the X platform, which attracted widespread attention. The highlight of the research is the proposal of a new focus mechanism - Native Sparse Attention (NSA), aiming to improve the efficiency of long text processing. The major innovations of NSA include dynamic hierarchical sparse strategies, coarse-grained and fine-grained token processing, and optimization with hardware.

The traditional attention mechanism of NSA is used to process computational dilemma when dealing with long sequences, which reduces unnecessary calculations through sleep attention, and is suitable for training and reasoning stages. The architecture consists of compression, selection and sliding window attention, ensuring a balance between addressing global and local information processing. Experimental results show that NSA performs well in multiple benchmarks, especially on long text tasks, significantly improving the model's search reasoning and reasoning capabilities, while significantly improving the computing power of higher education.