New 7nm Achronix Speedcore Gen4 eFPGA IP Brings Improvements
Achronix unveiled this week their fourth-generation Speedcore eFPGA technology, targeting 7nm CMOS.
While this new IP continues the mission of allowing FPGA fabric to be part of any SoC/ASIC design, the latest version has numerous features aimed specifically at accelerating machine learning inferencing. Designed for integration into users’ SoCs, this is an alternative to multi-chip solutions with discrete FPGAs or other custom AI accelerators.
According to Achronix , Speedcore Gen4 increases performance 60%, reduces power by 50% and die area by 65% while retaining the original Speedcore eFPGA IP’s abilities. It has been designed to bring programmable hardware-acceleration capabilities to a broad range of compute, networking and storage systems for interface protocol bridging/switching, algorithmic acceleration and packet processing applications.
Speedcore lets you build an FPGA to your exact specifications. Since modern FPGAs contain LUT fabric, multipliers/DSP blocks, and embedded memories, at the least, and often also include processor cores and other hard blocks, stand-alone FPGAs are always built as a “guess” by the FPGA company about the relative amounts of each of these resources needed for given broad classes of applications. eFPGAs like Speedcore allow your to tailor the mix of resources exactly to your anticipated application needs. And, since you’re designing your own SoC anyway, you have the flexibility to merge the FPGA core with any number of other hard resources and IO.
With the Speedcore Gen4 architecture, Achronix adds what it calls Machine Learning Processor (MLP) blocks to the library of available blocks and delivers 300% higher system performance for artificial intelligence and machine learning (AI/ML) applications. These MLP blocks are aimed at the kind of matrix-multiply operations common in CNN inferencing. Each MLP includes a local cyclical register file that leverages temporal locality for optimal reuse of stored weights or data. The MLPs are tightly coupled with neighboring MLP blocks and larger embedded memory blocks, and they support multiple precision fixed point and floating point formats, including Bfloat16, 16-bit, half-precision floating point, 24-bit floating point, and block floating point (BFP). In many ML applications, reducing the precision of these calculations can yield massive gain in performance and power consumption with very little loss in accuracy. By supporting a wide range of precisions, the MLP allows you to find the optimal compromise between performance and accuracy for your application.
Other architecture changes and improvements include a new 8-1 mux, which allows up to 8-wide muxing with a single level of logic. Also new is an 8-bit ALU with 2x the adder density of the previous generation. The new ALU is aimed at AI/ML applications, where it is frequently used for adders, counters, and comparators. There is also a new 8-bit cascadable bus-maximum function, new high-efficiency dedicated shift registers, and a new 6-input LUT with 2 registers per LUT.
The routing architecture also has been enhanced with an independent and dedicated bus routing structure that includes dynamically selectable bus muxing that effectively create a distributed, run-time-configurable switching network. This is the first time that run-time logic functionality is available in the routing structure and it provides an optimal solution for high-bandwidth and low-latency applications.
Achronix uses its Speedcore Builder tool to create custom Speedcore instances to match each user’s requirements. The user can then evaluate the suitability of the generated eFPGA block for their application, and Achronix can supply die size and power information as well. This allows design teams to have a solid understanding of the functional applicability, performance, and power consumption of their eFPGA implementations long before they commit to silicon.
Achronix says Speedcore Gen4 for TSMC 7nm CMOS is available today and will be in production in 1H 2019. The company will then back-port Speedcore Gen4 for TSMC 16nm and 12nm with availability in 2H 2019.