IBM Says New POWER9-based AC922 Power Systems Offer 4x Deep-learning Framework Performance Over x86
IBM's next-generation Power Systems Servers incorporating the new POWER9 processor are capable of improving the training times of deep learning frameworks by nearly 4x, the company claims.
The new POWER9-based AC922 Power Systems embed PCI-Express 4.0, next-generation NVIDIA NVLink and OpenCAPI, which combined can accelerate data movement. IBM calculates that the new system is 9.5x faster than PCI-E 3.0 based x86 systems:
- x86 PCI Express 3.0 (x16) peak transfer rate is 15.75 GB/sec = 16 lanes X 1GB/sec/lane x 128 bit/130 bit encoding.
- POWER9 and next-generation NVIDIA NVLink peak transfer rate is 150 GB/sec = 48 lanes x 3.2265625 GB/sec x 64 bit/66 bit encoding.
The system was designed to drive performance improvements across popular AI frameworks such as Chainer, TensorFlow and Caffe, as well as accelerated databases such as Kinetica. Thismeans that data scientists could build applications faster, ranging from deep learning insights in scientific research, real-time fraud detection and credit risk analysis.
POWER9 is at the heart of the soon-to-be most powerful data-intensive supercomputers in the world, the U.S. Department of Energy's "Summit" and "Sierra" supercomputers, and has been tapped by Google.
The AC922 pairs POWER9 CPUs and NVIDIA Tesla V100 with NVLink GPUs. CPU to GPU coherence in the AC922 addresses these concerns by allowing accelerated applications to leverage System memory as GPU memory.
The POWER9 CPU is available on configurations with 16, 18, 20 and 22 cores, for up to 44 cores in the AC922 server.
IBM expects partners will provide FPGAs and NAND flash drives for its PCIe Gen 4 slots. It has demonstrated systems with the Mellanox Innova 2 Ethernet/FPGA card which has not yet announced its general availability date.
Xilinx has a prototype FPGA working on the OpenCAPI link. The interface is based on standard serdes to ease porting for logic chips as well as future storage-class memories, however, IBM gave no other examples of chips planned for the interconnect.
Deep learning is a fast growing machine learning method that extracts information by crunching through millions of processes and data to detect and rank the most important aspects of the data.
To meet these growing industry demands, four years ago IBM set out to design the POWER9 chip on a blank sheet to build a new architecture to manage free-flowing data, streaming sensors and algorithms for data-intensive AI and deep learning workloads on Linux.
IBM's PowerAI cloud platform allows for simplified deployment of deep learning frameworks and libraries on the Power architecture with acceleration, allowing data scientists to be up and running in minutes. IBM researchers have already cut deep learning times from days to hours with the PowerAI Distributed Deep Learning toolkit.