Next-Generation AMD EPYC CPUs and Radeon Instinct GPUs Enable El Capitan Supercomputer to Break 2 Exaflops Barrier
Hewlett Packard Enterprise (HPE) and AMD will deliver the world’s fastest exascale-class supercomputer for the U.S. Department of Energy’s (DOE) National Nuclear Security Administration (NNSA) at a record-breaking speed of 2 exaflops - 10X faster than today’s most powerful supercomputer.
AMD, Lawrence Livermore National Laboratory (LLNL) and HPE announced that El Capitan, the upcoming exascale class supercomputer at LLNL, will be powered by next generation AMD EPYC CPUs, AMD Radeon Instinct GPUs and open source AMD ROCm heterogeneous computing software.
With delivery expected in early 2023, the El Capitan system is expected to be the world’s fastest supercomputer with more than 2 exaflops of double precision performance. This record setting performance will support National Nuclear Security Administration requirements for its primary mission of ensuring the safety, security and reliability of the nation’s nuclear stockpile.
The AMD based nodes will be optimized to accelerate artificial intelligence (AI) and machine learning (ML) workloads to potentially enable the expanded use of AI and ML into the research, computational techniques and analysis that benefits NNSA missions.
AMD technology within El Capitan includes:
Next generation AMD EPYC processors, codenamed “Genoa” featuring the “Zen 4” processor core. These processors will support next generation memory and I/O sub systems for AI and HPC workloads,
Next generation Radeon Instinct GPUs based on a new compute-optimized architecture for workloads including HPC and AI. These GPUs will use the next- generation high bandwidth memory and are designed for deep learning performance,
The 3rd Gen AMD Infinity Architecture, which will provide a high-bandwidth, low latency connection between the four Radeon Instinct GPUs and one AMD EPYC CPU included in each node of El Capitan. As well, the 3rd Gen AMD Infinity Architecture includes unified memory across the CPU and GPU, easing programmer access to accelerated computing,
An enhanced version of the open source ROCm heterogenous programming environment, being developed to tap into the combined performance of AMD CPUs and GPUs, unlocking maximum performance.
HPE’s Cray Shasta technologies, which were built from the ground up to support a diverse set of processor and accelerator technologies to meet new levels of performance and scalability, will enable the DOE’s El Capitan to meet NNSA requirements, which include the NNSA’s Life Extension Program (LEP), a critical part of stockpile stewardship that aims to modernize aging weapons in the U.S. nuclear stockpile that are to remain safe, secure, and effective.
HPE’s Cray Shasta system and Slingshot interconnect along with a specialized HPC networking solution will also be ised in the new supercomputer.
HPE and AMD designed new technologies to achive a streamlined communication between HPE’s Cray Slingshot interconnect, a specialized HPC networking solution, and the new next-generation AMD Radeon Instinct GPUs.
The companies also followed a new approach using accelerator-centric compute blades (in a 4:1 GPU to CPU ratio, connected by the 3rd Gen AMD Infinity Architecture for high-bandwidth, low latency connections) to increase performance for data-intensive AI, machine learning and analytics needs by offloading processing from the CPU to the GPU.
Other performance enhancements include unique storage and software capabilities that are integrated with HPE’s Cray Shasta architecture. Additional use of flash-based local storage systems, designed specifically for the new system’s performance needs, will provide a buffer to balance existing on-board memory and data-tiering, which is monitored by Cray Shasta’s intelligent software solutions to automate data movement for optimal storing and timely access.
HPE is also expanding its partnership with LLNL to actively explore HPE optics technologies, a computing solution that uses light to transmit data, to feature in the DOE’s El Capitan. HPE’s optics technologies stem from R&D efforts related to PathForward, a program backed by U.S. DOE’s Exascale Computing Project. HPE developed and demonstrated breakthrough optics prototypes that integrate electrical-to-optical interfaces to enable broad use in future classes of system interconnects.
In addition to the DOE’s El Capitan, HPE will deliver the other two U.S DOE exascale systems announced in 2019, Aurora and Frontier.
Over time, HPE will integrate its exascale technologies into its broader HPC product portfolio to deliver supercomputers of any size for every data center.
All but one of the world’s current top 10 fastest supercomputers use central processor chips from either Intel or IBM, according to supercomputing research group TOP500. The exception is the third-fastest, which is located in China and uses a domestically developed chip.