Keyboards

DeepSeek R2 Leak Reveals 512 PetaFLOPS Push on Domestic AI Accelerator Infrastructure

DeepSeek R2 Leak Reveals 512 PetaFLOPS Push on Domestic AI Accelerator Infrastructure

DeepSeek, a company that took the AI world by storm with its R1 model, is preparing a new and reportedly much improved DeepSeek R2 model release, according to a well-known AI insider @iruletheworldmo on X. Powered by Huawei’s Ascend 910B chip clusters, a possible Huawei Atlas 900, and DeepSeek’s in-house distributed training framework, R2 pushes these accelerators to an impressive 82% utilization, translating to 512 PetaFLOPS of FP16 performance—half an exaFLOP in computing power. According to Huawei lab data, that’s roughly 91% of what NVIDIA’s older A100 clusters deliver, yet DeepSeek claims it cuts per-unit training costs by a remarkable 97.3%. Behind DeepSeek R2 is a carefully cultivated ecosystem of partners. Tuowei Information, a leading OEM in the Ascend family, manages over half of DeepSeek’s supercomputing hardware orders, while Sugon provides liquid-cooled server racks capable of handling up to 40 kW per unit. To keep power consumption in check, Innolight’s silicon-photonics transceivers shave off another 35% compared to traditional solutions.

Geographically, operations are split across major hubs: Runjian Shares runs the South China supercomputing center under contracts exceeding ¥5 billion annually, and Zhongbei Communications maintains a 1,500-PetaFLOP reserve in the Northwest for peak demands. On the software side, DeepSeek R2 already supports private deployment and fine-tuning, powering smart-city initiatives in 15 provinces through the Yun Sai Zhilian platform. North China’s node, overseen by Hongbo Shares’ Yingbo Digital, adds another 3,000 PetaFLOPS to the mix. If computing power is scarce, Huawei is ready to deploy its CloudMatrix 384 system, which is positioned as a domestic alternative to NVIDIA’s GB200 NVL72. It features 384 Ascend 910C accelerators to achieve 1.7× the overall petaFLOPS and 3.6× the total HBM capacity of the NVL72 cluster—yet it lags significantly in per-chip performance and consumes nearly four times more power. Nonetheless, the R2 model launch is expected to come smoothly online, and we are waiting for the official launch and benchmarks to see its performance.

Leave a Reply

Your email address will not be published. Required fields are marked *