Huawei and Peng Cheng Laboratory Plan to Build 1000 PFLOPS Cloud Brain II AI Research Platform

Huawei and Peng Cheng Laboratory (PCL) have jointly released Peng Cheng Cloud Brain II Phase 1, officially launching the journey to AI clusters at 1000 petaFLOPS (PFLOPS) scale.

This marks a new milestone in the scientific research field for the Kunpeng computing industry. Running at the bedrock of Cloud Brain II is the Huawei Atlas 900 AI cluster, powered by the Huawei Kunpeng and Ascend processors. Atlas 900 infuses computing power into Cloud Brain II, supporting basic research and exploration in the AI field, such as computer vision, natural language, autonomous driving, smart transportation, and smart healthcare. The computing power of Peng Cheng Cloud Brain is currently 100 PFLOPS, planned to scale to 1000 PFLOPS and higher next year.

"This September, Huawei embarked on the Kunpeng + Ascend dual-engine computing strategy. Inspired by this strategy, we are committed to providing the ultimate computing power to the world. We also released Atlas 900, the world's fastest AI training cluster," said Hou Jinlong, Senior VP of Huawei, and President of Huawei Cloud & AI Products and Services.

Hou also said, "Right now we are building Cloud Brain II Phase 1. I believe that, with our joint effort, this will pave the way to a Cloud Brain II at 1000 PFLOPS scale in the near future. We are confident that it will become a world-leading AI research platform."

"We can provide an alternative computing system in the future other than current dominant x86-based processors," Hou added.

The Atlas 900 AI cluster is composed of thousands of Ascend 910 AI processors and completes training of a ResNet image classification model in 59.8s, 10 seconds faster than the previous world record at the same precision. Atlas 900 highlights include:

Combining thousands of Ascend 910 AI processors, Atlas 900 delivers 256–1024 PFLOPS at half precision (FP16), which equals the computing power of 500,000 PCs. The SoC design integrates AI computing, general-purpose computing, and I/O functionality to improve training efficiency.
High-speed cluster network: It supports three types of high-speed network interfaces: Huawei Cache Coherence System (HCCS), PCIe 4.0, and 100G RoCE, slashing the gradient synchronization latency by 10% to 70% for a leap in model training efficiency. It leverages an iLossless intelligent switching algorithm to enable real-time learning and training of network-wide traffic, achieving zero packet loss and end-to-end latency of microseconds.
Heat dissipation: Atlas 900 uses a cabinet-level contained adiabatic system, achieving a liquid cooling ratio over 95% and a system power usage effectiveness (PUE) less than 1.1 (an ideal PUE is 1.0).

Up to now, based on the Ascend 910 and 310 AI processors, Huawei has launched the Atlas 900 AI cluster, Atlas 800 AI server, Atlas 500 AI edge station, Atlas 300 AI accelerator card, and Atlas 200 AI accelerator module.