Intel Announces First Xeon Scalable Processor with Integrated Intel Arria 10 FPGA - Printer Friendly version without Comments

Appeared on: Thursday, May 17, 2018
Intel Announces First Xeon Scalable Processor with Integrated Intel Arria 10 FPGA

Intel has started sampling the Xeon Scalable Processor 6138P, which has an integrated Intel Arria 10 field programmable gate array (FPGA).

This marks the first production release of an Intel Xeon processor with a coherently interfaced FPGA - an important result of Intel's acquisition of Altera.

The Intel Xeon Scalable Processor 6138P includes the Intel Arria 10 GX 1150, which provides up to 160Gbps of I/O bandwidth per socket and a cache-coherent interface for tightly coupled acceleration. The Intel Arria 10 GX 1150 has its own cache and shares memory with the processor via low-latency, cache coherent access over the Intel Ultra Path Interconnect (Intel UPI) bus. Intel UPI allows access to data regardless of where the data resides (core cache, FPGA cache, or memory) without the need for redundant data storage and direct memory access (DMA) transfers. Data coherency also reduces application programming complexity and saves CPU cycles that would be wasted to determine which data is most-up-to-date.

An example of this system capability is Intel's new virtual switching reference design for the Intel Xeon Scalable processor with integrated FPGA. This reference design uses the FPGA for infrastructure dataplane switching, while the processor does application processing or processes virtual machines. This helps simplify network complexity and improve the productivity of the processor.

This solution is also compatible with the Open Virtual Switch (OVS) framework and delivers, according to Intel, a dramatic 3.2X throughput improvement at half the latency and 2X more VMs as compared to OVS running on an equivalent processor without FPGA acceleration.

Fujitsu plans to deliver systems based on the Intel Xeon processor with integrated FPGA and Intel's OVS reference design.

Intel is pushing the use of Intel FPGAs and other accelerators in the datacenter. The company's future roadmap will introduce a discrete FPGA solution with faster coherent and increased high-bandwidth interconnect enabled by the Acceleration Stack for Intel Xeon CPU with FPGAs. It will support code migration from the Intel Xeon Scalable processor with Integrated FPGA and the Intel Programmable Acceleration Card (Intel PAC) solutions, and will continue to be optimized for enhanced bandwidth and low latency.

Why FPGAs?

FPGA stands for field-programmable gate array. What they are is programmable hardware. So, you do a hardware design, and you flash it on the FPGA chip. That's why they are called field-programmable, because you can change them on the fly, in the field.

FPGA chips had been used widely in telecom. They're very good at processing streams of data flowing through, quickly, and for testing out chips that you were going to build.

The CPUs and GPUs are also very capable of processing different types of workloads. What a CPU actually does well, is it takes a small amount of data called the "working set" sitting one data cache and then it streams instructions by them, and operating on those data. But if the data that those instructions are operating on is too big or if those data are too big, the CPU doesn't actually work very well. The CPU it has to issue a bunch of instructions to process each byte, and if those bytes are coming in at 12 1/2 billion bytes a second, that's a lot of thread.

On the other hand, the GPUs do well is something called SIMD parallelism, which stands for Single Instruction Multiple Data, and the idea there is you have a bunch of tasks that are the same, all operating on similar, but not identical, data. So, you can issue one instruction and that instruction ends up doing the same operation on say, eight data items in parallel.

The FPGAs are actually a transpose of the CPU model. So rather than pinning a small amount of data and running instructions through, on an FPGA we pin the instructions, and then run the data through.

The idea is, you take a computational structure, you know, a graph of operations, and you pin it, and you're just flowing data through it continuously. As a result, the FPGAs are really good for those sorts of workloads.