The start-up’s bid to train on Ascend processors faltered, forcing a return to Nvidia for training and underscoring the hurdles in Beijing’s self‑reliance campaign.

DeepSeek, China’s headline-grabbing artificial intelligence start‑up, has delayed the release of its next model after repeated attempts to train it on Huawei’s Ascend chips failed to cross the finish line. The company instead reverted to using Nvidia hardware for the heavy lifting of training while reserving Huawei’s processors for inference, according to people familiar with the effort. The setback highlights the profound technical gap that still separates China’s domestic accelerators from the U.S. leader in the most compute‑intensive phase of AI development.
WHAT HAPPENED
After launching its widely discussed R1 model in January, DeepSeek set out to train a successor—internally referred to as R2—on clusters built around Huawei’s Ascend processors. The push was encouraged by Chinese authorities seeking to reduce reliance on U.S. technology. Despite weeks of engineering work and the involvement of Huawei specialists, training runs on Ascend reportedly suffered repeated failures and stability issues. Facing time pressure, DeepSeek pivoted: Nvidia GPUs were brought back for training, while Ascend hardware would serve lower‑risk inference workloads. The company also contended with longer‑than‑expected data‑labeling cycles, further nudging the timeline.
A LAUNCH WINDOW THAT SLIPPED
DeepSeek had targeted a May launch window for the new model. The difficulties with Ascend clusters—along with the data pipeline delays—pushed the timetable into the summer. Internally, founder and chief executive Liang Wenfeng has pressed teams to accelerate both quality and delivery, even as rivals such as Alibaba’s Qwen series moved quickly to productize similar techniques.
WHY TRAINING IS THE BREAKING POINT
In modern AI, training is the crucible: models ingest trillions of tokens across thousands of accelerators for weeks or months. Any instability, driver mismatch, or interconnect bottleneck can crash a long‑running job and waste vast compute budgets. Industry engineers point to three pain points where Chinese chips presently lag Nvidia’s ecosystem: cluster stability under sustained load, the speed and maturity of inter‑GPU connectivity, and the breadth and polish of software stacks—from kernels and graph compilers to libraries and developer tooling. These are precisely the areas that make or break multi‑week training runs.
POLICY HEAT AROUND NVIDIA’S H20
The delay also lands amid growing regulatory scrutiny of U.S. chips in China. In recent days, Chinese regulators have summoned large internet companies to explain their purchases of Nvidia’s H20 processors and asked why domestic alternatives are not used. Beijing’s message is clear: for government‑related and sensitive deployments, domestic silicon should be the default. That guidance complicates the choices of AI developers who—like DeepSeek—must weigh policy signals against the operational realities of shipping models on schedule.
WHAT DEEPSEEK’S PIVOT REVEALS
By training on Nvidia and reserving Huawei’s Ascend for inference, DeepSeek is effectively acknowledging two truths. First, for the most demanding workloads—frontier‑scale training with complex optimizers and reinforcement‑learning loops—Nvidia’s platform still offers the best odds of completing runs on time. Second, inference is more forgiving: once a model is trained, serving it efficiently can be engineered around a wider range of hardware, including domestic chips, with careful optimization. This division of labor keeps DeepSeek moving while aligning, at least in part, with industrial policy.
THE SOFTWARE GAP MATTERS AS MUCH AS SILICON
Semiconductors tend to get the headlines, but software plumbing often determines whether clusters behave. Nvidia’s moat is not only CUDA and cuDNN; it is an ecosystem of drivers, libraries, profilers, schedulers, and a global developer base seasoned by a decade of large‑scale deployments. While Huawei’s Ascend stack has made rapid progress, insiders still cite reliability wobbles and a thinner library/tooling layer that forces teams to write more custom glue. Every extra shim adds surface area for bugs when thousands of GPUs must march in lockstep.
THE COST CALCULUS
DeepSeek’s own rise was powered by aggressive efficiency tactics—from mixture‑of‑experts routing to clever curriculum designs—that squeezed more capability from fewer chips. But efficiency cannot eliminate the need for stable, high‑bandwidth training fleets. Repeated restarts on a flaky cluster can erase savings and burn precious time. Against that backdrop, the decision to revert to Nvidia for training looks less like heresy and more like risk management.
COMPETITIVE PRESSURE IS RISING
China’s AI field has grown more crowded in 2025. Alibaba’s Qwen3 and other rivals have iterated quickly, closing the window for prolonged delays. For DeepSeek, the reputational cost of missing a cycle is real: the start‑up’s January breakthrough raised expectations that it could ship at Silicon‑Valley cadence. A stumble now hands momentum to competitors and reinforces a narrative—fair or not—that domestic chips still impose trade‑offs on schedule‑driven projects.
A STRESS TEST FOR SELF‑RELIANCE
Beijing’s goal of technological self‑sufficiency is not in doubt; the open question is sequencing. If developers are pressed to abandon foreign hardware before domestic stacks fully mature, they may face more blown training runs, longer debugging cycles, and delayed launches. Conversely, a transitional strategy—prioritizing domestic chips for inference and government workloads while allowing Nvidia for training in the near term—could keep model makers on track while the homegrown ecosystem hardens.
WHAT TO WATCH NEXT
Two timelines now matter. The first is DeepSeek’s revised window for releasing the R2 successor and the extent to which it narrows quality gaps with U.S. leaders. The second is Huawei’s roadmap: progress on cluster stability, compiler maturity, and high‑speed interconnects will determine whether future training runs can stay on Ascend end‑to‑end. If the software hardens and connectivity improves, the calculus could change quickly—especially if policy continues to steer demand toward domestic silicon.
THE BOTTOM LINE
DeepSeek’s delay is not merely a bump for one start‑up; it is a case study in where China’s AI stack excels today and where it still falls short. For all the ingenuity shown in squeezing performance from constrained hardware, the brutal physics of distributed training still reward the platform with the deepest and most mature ecosystem. Until domestic chips can match that, pragmatism—not ideology—will decide where the world’s most ambitious models are trained.



