With the reasoning model DeepSeek-R1 launched by Chinese AI company DeepSeek, attracting attention worldwide, its stability performance on third-party platforms has become a hot topic in the technology circle recently. According to the latest discussion and review data on the X platform, DeepSeek-R1's performance varies significantly across hosting platforms, with completeness, accuracy and reasoning time varying by platform choice. This phenomenon not only reveals the complexity of model deployment, but also provides an important reference for users to choose suitable hosting services.
Test background and method
According to feedback from X users and professional evaluation agencies, a recent cross-platform stability test for DeepSeek-R1 has attracted widespread attention. The test was led by the Artificial Intelligence Department of China Software Evaluation Center, and more than a dozen domestic and foreign third-party platforms including nano-AI search, Alibaba Bailian, silicon-based flow, etc., using a unified 20 basic mathematical reasoning problems (developed by the SuperCLUE team) as the benchmark. The evaluation mainly focuses on three dimensions: response rate, accuracy and reasoning time, and analyzes the differences between free and paid services.
Test results: The stability difference is significant
Test results show that the stability of DeepSeek-R1 is highly dependent on the hosting platform. Nano AI search is particularly outstanding because it connects to the "full blood version" DeepSeek-R1 and provides it for free. X user @op7418 posted on February 27: "Nanomic AI search was connected to the full-blooded version of DeepSeek-R1 at the first time, and performed well in the evaluation." This platform won praise for its high response rate and stable output, and is considered to be a practice of Zhou Hongyi's concept of "AI popularization".
However, the performance of other platforms is not satisfactory. X user @simonkuang938 pointed out on February 24 that when Alibaba Bailian’s DeepSeek-R1 is dealing with complex logical tasks (such as drawing charts or flow charts), it is often truncated due to excessive memory consumption, resulting in client stuttering, although the connection is not disconnected. He jokingly called this experience "bad" reflecting some users' dissatisfaction with stability.
In contrast, silicon-based liquidity is recognized by @simonkuang938 because it restricts the use of bonuses and provides a stable paid version. He said on February 22: "There are too few platforms as conscientious as silicon-based flow. R1 is a full-blooded version and has not been modified." This shows that paid services may have more advantages in stability.
User experience and technical details
Judging from user feedback on X, DeepSeek-R1's performance in different scenarios also has its own advantages. @changli71829684 mentioned on February 25 that R1 is prone to fall into a dead cycle when outputting more than 3,000 words in a single conversation. Although its information density is high and suitable for knowledge mining, its accuracy and production quality are slightly insufficient. He believes that the model is more suitable for "mind-opening" than for precise tasks. Additionally, @oran_ge tested DeepSeek R1Zero on January 29 that its unsupervised fine-tuning (SFT) version is weird on simple questions, such as outputting mathematical formulas when replying to "Hello", showing the instability of the model in a specific scenario.
It is worth mentioning that some users try to optimize the user experience of R1. @oran_ge shared a solution to connect to the Internet through API on February 12, saying that it is "actually measured is the most stable and fastest R1 user experience", which completely solved the problem of lag and networking. This exploration shows that technical configuration outside the platform may also affect stability.
Industry significance and user suggestions
This cross-platform test not only exposed the deployment challenges of DeepSeek-R1, but also triggered discussions on the commercialization and stability of open source models. X users generally believe that although DeepSeek-R1 performs well in mathematical and programming benchmarks (such as MATH-500 score 97.3%), its stability in actual applications still needs to be optimized. The traffic pressure and high load of free services can lead to performance degradation, while paid platforms provide a more reliable experience through resource allocation.
In this regard, industry insiders recommend that users choose a hosting platform according to their needs. For developers who pursue high response rates and complete output, stable services such as nano-AI search or silicon-based flow are good choices; for users who need to handle complex inference tasks, paid platforms may be better able to meet the needs. Meanwhile, DeepSeek officials have been called for more hardware support or paid tiers to alleviate the congestion problem of free services, as @GrayPsyche expected in the February 8 post.
DeepSeek-R1's third-party platform stability assessment reveals a key fact: Although the model has great potential, its actual performance varies by hosting environment. From efficient and free services for nano AI search, to Alibaba Bailian’s cutoff problem, to the stable paid experience of silicon-based flow, users need to weigh costs and performance. With the popularity of AI technology, the future development of DeepSeek-R1 and its competitiveness in the global market may depend on whether it can solve these stability challenges. The hot discussion on X platform is still continuing, and this topic will undoubtedly continue to attract the attention of the industry.