CEA-Leti Presents High-Performance, 96-Core Processor Made of Chiplets
This week at the IEEE Solid-State Circuits Conference (ISSCC) in San Francisco, French research organization CEA-Leti showed a a 96-core processor made out of six chiplets.
The work's results were presented Feb. 17 at ISSCC 2020 in the paper, "A 220GOPS 96-Core Processor with 6 Chiplets 3D-Stacked on an Active Interposer Offering 0.6ns/mm Latency, 3Tb/s/mm2 Inter-Chiplet Interconnects and 156mW/mm2 @ 82%-Peak-Efficiency DC-DC Converters."
Some of the most of advanced processors, such as AMD’s Zen 2 processor family, are actually a collection of chiplets bound together by high-bandwidth connections within a single package.
Large-scale interposer techniques for chiplet integration have been fabricated using various technologies, such as 2.5D passive interposers, organic substrates, and silicon bridges. But these technologies lack flexible long-distance chiplet-to-chiplet communications to connect a larger number of chiplets. They also lack smooth integration of heterogeneous chiplets, and the easy integration of less-scalable functions such as tightly coupled power-management solutions, analog functions and IO IPs.
In the framework of IRT Nanoelec, CEA-Leti and List overcame these limitations by introducing an active-interposer technology that enables integration of some active CMOS circuitry on a large-scale interposer. They also managed its implementation on a STMicroelectronics process using a 3D CAD tool design flow from Mentor Graphics, a Siemens business.
The active interposer integrates:
- voltage regulators fully integrated without passives for efficient power management of the 3D-stacked chiplets
- flexible system interconnect topologies between all chiplets for scalable cache-coherency support
- energy-efficient 3D-plugs for dense high-throughput, inter-layer communication, and
- a memory-IO controller and the physical layer (PHY) for socket communication.
"This is a breakthrough in terms of system-and-architecture integration, achieved all the way from the architecture concept down to a silicon prototype," said Pascal Vivet, lead author of the paper. "In addition, 3D technology and associated design techniques now are available to implement large-scale computing systems, offering for the first time a chiplet-based 96-core computing architecture."
Vivet explained that the active interposer integrates flexible and distributed interconnects, 3D-plug communication IP, and power management IP to offer overall a fully integrated and energy-efficient many-core computing architecture. As a result, users will get more GOPS at the same power budget – or a reduced energy footprint for the same task – and will benefit from an increased memory-computing ratio along the memory hierarchy. These are main drivers to address big data applications.
"Active interposer technology will also be an enabler to integrate heterogeneous functions," Vivet said. "Chiplet-based ecosystems will deploy rapidly in high-performance computing and various other market segments, such as embedded HPC for the automotive and other sectors. "The active interposers also create opportunities to revisit system partitioning and implement extra functions at the interposer level. The ecosystem collaborated to build future 3D technology platforms that will benefit from CEA technologies to create major differentiators in future work," Vivet said.
Future work will address die-to-wafer hybrid bonding technology, which offers denser 3D interconnects with better electrical, mechanical and thermal parameters, and allows ultra-dense, low-energy parallel interfaces. For the longer term, CEA-Leti is also investigating innovative photonic-interposer technology as a 3D-based photonic chiplet approach, offering low-latency, high-bandwidth, energy-efficient photonic communication.
The prototype's 96 computing cores are organized in six chiplets in 28nm FDSOI, CMOS node, which are 3D-stacked in a face-to-face configuration using 20µm pitch micro-bumps onto an active interposer embedding through-silicon vias (TSVs) in a 65nm technology node. The overall system architecture offers a fully scalable distributed cache-coherent architecture between all the chiplet computing tiles, which are interconnected through the active interposer. The cache-coherent architecture allows easy software deployment through a hierarchy of caches, for full system scalability up to 512 cores.
Unlike CEA-Leti’s prototype, commercial systems that rely on chiplets use silicon interposers that have no active circuitry embedded in them. And many systems don’t even use silicon, instead relying on organic circuit-board materials or small pieces of silicon embedded in the organic board.
Rather than simple mix-and-match systems, these require a large amount of codesign between the chiplets and the package that integrates them. Nevertheless, the results can be worthwhile.
At ISSCC, AMD detailed how it codesigned the chiplets and package that make up its Zen 2 high-performance processors. The resulting system broke a record that night at ISSCC. When overclocked and cooled with a decanter of liquid nitrogen, the single AMD processor scored 39,744 on the Cinebench 3D rendering benchmark. The record a few weeks ago was about 32,000 set by a roughly US $20,000 128-core server system, according to AMD’s Jerry Ahrens.