TechForge

April 22, 2025

Share this story:

Tags:

Categories::

  • Huawei prepares to ship Ascend 910C AI chip in volume.
  • Part of Huawei’s strategy to scale performance through system design.

Huawei is preparing to increase shipments of its Ascend 910C artificial intelligence processor, with mass deliveries to Chinese customers expected to begin as early as next month.

The timing coincides with rising restrictions on access to Nvidia’s advanced AI chips under US export controls, leading Chinese companies to look into domestic alternatives.

The 910C, developed by Huawei’s semiconductor unit HiSilicon, represents a scaled-up version of the earlier 910B model. The chip uses a dual-chiplet design to effectively double processing power and memory capacity. According to sources familiar with the design, the 910C was created using advanced EDA tools and supports a wide range of AI workloads. Some shipments have already begun.

Huawei’s strategy to remain competitive despite manufacturing constraints focuses on brute-force scaling. The company compensates for its inability to use the latest process technologies by installing more processors per system. This approach underpins the design of CloudMatrix 384, Huawei’s rack-scale AI system made up of 384 Ascend 910C processors – it spans 16 racks, including 12 for compute and four for networking, and uses 6,912 linear plug-able optical (LPO) transceivers to create a high-bandwidth, all-optical mesh network.

CloudMatrix 384 offers dense compute output of around 300 PFLOPs in BF16 precision, significantly surpassing Nvidia’s GB200 NVL72 system, which delivers roughly 180 PFLOPs.

It also features 2.1 times more total memory bandwidth and over 3.6 times greater HBM capacity. Each 910C processor integrates eight HBM2E memory modules, and the system supports both horizontal and vertical scaling through its optical interconnect architecture.

Its performance comes with trade-offs. The entire CloudMatrix 384 system consumes about 559 kilowatts of power, compared to 145 kilowatts for Nvidia’s competing setup. On a per-watt basis, Huawei’s system is roughly 2.3 times less efficient. However, energy costs in parts of mainland China have declined in recent years, with prices as low as $56 per megawatt-hour in 2025.

In a domestic context where electricity is relatively cheap but access to advanced silicon is limited, the trade-off is manageable.

The Ascend 910C chip is designed for training and inference on large AI models. It provides 780 BF16 TFLOPs per unit, outperforming AMD’s MI250X (383 TFLOPs) but trailing Nvidia’s B200, which provides more than 2.2 PFLOPs.

Huawei believes that system-level design and scale will help to close the performance gap. Huawei wants to match or exceed competitors’ total throughput by incorporating more chips into each system and using a fully optical interconnect. The processor’s design omits a central I/O die, instead relying on two compute chiplets packaged together.

The architecture mirrors trends seen in other high-end chips like AMD’s Instinct MI250X and Nvidia’s B200. The 910C was built using 7nm-class process technologies. Some chiplets are produced domestically by SMIC, though yield rates remain low. Others were reportedly manufactured by TSMC via indirect arrangements involving third-party entities like Sophgo, despite US sanctions.

youtube

Huawei’s sourcing strategy applies to memory as well. The HBM2E modules used in the Ascend 910C are mostly sourced from Samsung and routed through intermediaries like CoAsia and Faraday Technology to meet export control thresholds. Components are built and shipped to China, where the memory stacks are incorporated into Huawei’s final system-in-package units.

Despite the supply chain complexities, Huawei is said to have acquired enough wafers to create more than a million Ascend 910C chips between 2023 and 2025. SMIC’s capabilities are expected to expand, bringing more of this production in-house.

Huawei’s CloudMatrix 384 represents the company’s broader ambitions of maintaining China’s AI development momentum. The system’s optical interconnects enable more than 5.5 petabits per second of bandwidth across racks, minimising signal loss and latency. The layout includes 32 accelerators per compute rack and is designed for both scale-up and scale-out deployment. Fault tolerance and enterprise-grade reliability are also built into the system.

While Huawei’s overall outcomes are less efficient in terms of power use and memory bandwidth compared to Nvidia’s hardware, their availability and scalability make them an appealing option for Chinese firms facing hardware shortages. Given that Nvidia’s GB200 NVL72 is out of reach due to export controls, systems like CloudMatrix 384 may become the default solution for AI model training in China.

With geopolitical concerns affecting the global semiconductor scene, Huawei’s system design reflects both adaptation and resilience. The company may not match Nvidia on chip-level specs, but its investment in scaling hardware, integrating optics, and navigating supply chain limitations positions it to play a key role in China’s AI infrastructure in the future.

About the Author

Muhammad Zulhusni

As a tech journalist, Zul focuses on topics including cloud computing, cybersecurity, and disruptive technology in the enterprise industry. He has expertise in moderating webinars and presenting content on video, in addition to having a background in networking technology.

Related

September 10, 2025

September 10, 2025

September 9, 2025

September 8, 2025

Join our Community

Subscribe now to get all our premium content and latest tech news delivered straight to your inbox

Popular

34475 view(s)
6317 view(s)
6279 view(s)
5772 view(s)

Subscribe

All our premium content and latest tech news delivered straight to your inbox

This field is for validation purposes and should be left unchanged.