Europe's ExaNoDe Project Builds 3DIC Exascale Compute Node

After three years of development, the European Exascale Processor Memory Node Design (ExaNoDe) project has built what it claims is a compute node prototype that combines a 3DIC with multi-chip-module integration technologies, heterogeneous compute elements with Arm cores and FPGA acceleration, and the UNIMEM memory system.

The prototype multi-chip-module (MCM) that integrates Arm cores, FPGAs, and 3D active interposer/chiplet technology.

The goal of the €8.6 million project was to develop the architectural foundations of a compute node suitable for exascale number-crunching. ExaNoDe is one of four projects being funded out the European Union’s Horizon 2020 program that aims at building exascale supercomputers.

The node uses 7nm Arm-core based chiplets with a 3D active interposer with chiplets and HBM2 to achieve affordability and energy efficiency required for an exascale-class compute node. The prototype also integrates a high-performance and productive programming environment.

The innovative interposer, developed by CEA, enables multiple system-on-chip (SoC) chiplets to be combined, forming a three-dimensional integrated circuit (3DIC). This delivers higher chip fabrication yields thanks to the smaller chip size and reduced inter-chip communication distances, resulting in improved energy efficiency. In addition, the technique reduces the costs of customization and adds the flexibility to slot in compute elements such as CPUs and accelerators in a single chip for different applications.

The project also takes advantage of the UNIMEM memory system, created in the EUROSERVER project, which is being brought to scale in the EuroEXA project; the result is the ability to create shared memory among multiple compute nodes. UNIMEM is a PGAS-style global addressing scheme that allows compute nodes to share non-coherent memory across a cluster. Essentially, it’s a model that separates local coherent memory with that of remote memory, employing an API to negotiate between them. It uses RDMA to provide access to data where it resides, largely avoiding costly copies between nodes.

The software stack also includes virtualization, with checkpointing and virtualization of the UNIMEM capabilities.

In addition, some ‘mini applications’ — self-contained and based on real-life applications — have been developed and ported to the architecture. Initial work has been performed to accelerate the key kernels on the compute node’s FPGA logic. ETHZ developed the open source ExaConv convolutional neural network accelerator to accelerate neural network training as a demonstration of heterogeneous integration.

The node prototype board is comprised a pair of MCMs, each with two FGPAs. The MCM on the left has an interposer topped with what appears to be three chiplets. The FPGAs are the same ones recently delivered in the ExaNeSt project prototype, namely the Xilinx Zynq Ultrascale+ SoCs. Embedded in these SoCs are Arm Cortex-A53 cores, which serves as the Arm component here.

The chiplets delivered with the prototype, which were manufactured by STMicroelectronics’s FD-SOI 28nm process, only contain communication logic for the interposer-chiplet complex plus a specialized hardware block for accelerating convolution neural network (CNN) computations. In this case, the CNN accelerator, which is based on an open-source implementation (EvaConv) developed at ETH Zurich, is only for demonstration purposes, although it would certainly be a prime candidate to include in a production chiplet, given the interest by HPC users to apply deep learning to their application workflows.

Going forward, the ExaNoDe software building blocks and application ports will be taken up by EuroEXA, a project tasked to design and build a petascale prototype using FPGAs, Arm, and other technologies.