Achronix Introduces New Speedster7t FPGA Family for High-Bandwidth Data Acceleration Applications

FPGA-based hardware accelerator maker Achronix Semiconductor today introduced a new FPGA family designed for artificial intelligence/ machine learning (AI/ML) and high-bandwidth data acceleration applications.

Specifically designed for AI/ML and high-bandwidth workloads, the Speedster7t FPGA family features a new 2D network-on-chip (NoC), and a high-density array of new machine learning processors (MLP). It is actually blending FPGA programmability with ASIC routing structures and compute engines to deliver a new “FPGA+” class of chips, according to Robert Blake, President and CEO of Achronix Semiconductor.

In developing the Speedster7t family of FPGAs, Achronix’s engineering team redesigned the entire FPGA architecture to balance on-chip processing, interconnect and external I/O, to maximize the throughput of data-intensive workloads such as those found in edge- and server-based AI/ML applications, networking and storage.

Manufactured on TSMC’s 7nm FinFET process, Speedster7t devices are designed to accept massive amounts of data from multiple high-speed sources, distribute that data to programmable on-chip algorithmic and processing units, and then deliver those results with the lowest possible latency. Speedster7t devices include high-bandwidth GDDR6 interfaces, 400G Ethernet ports, and PCI Express Gen5 — all interconnected to deliver ASIC-level bandwidth while retaining the full programmability of FPGAs.

Up to 8 GDDR6 controllers in a Speedster7t device can support an aggregate GDDR6 bandwidth of 4 Tbps.

Speedster7t devices have up to 72 of the industry’s highest performance SerDes that can operate from 1 to 112 Gbps plus hard 400G Ethernet MACs with forward error correction (FEC), supporting 4x 100G and 8x 50G configurations, plus hard PCI Express Gen5 controllers with 8 or 16 lanes per controller.

At the heart of Speedster7t FPGAs are a parallel array of programmable compute elements within the new MLPs. The MLPs are highly configurable, compute-intensive blocks that support integer formats from 4 to 24 bits and floating-point modes including direct support for TensorFlow’s 16-bit format as well as the supercharged block floating-point format that doubles the compute engines per MLP.

The MLPs are coupled with embedded memory blocks, eliminating the traditional delays associated with FPGA routing to ensure that data is delivered to the MLPs at the maximum performance of 750 MHz. This combination of high-density compute and high-performance data delivery results in a processor fabric that delivers the highest usable FPGA-based tera-operations per second (TOps).

The Speedster7t architecture includes a high-bandwidth, two-dimensional NoC that spans horizontally and vertically over the FPGA fabric, connecting to all of the FPGA’s high-speed data and memory interfaces. Acting like a superhighway network superimposed on the city street system of the FPGA interconnect, the Speedster7t NoC supports the high-bandwidth communication needed between on-chip processing engines. Each row or column in the NoC is implemented as two 256-bit, unidirectional industry-standard AXI channels operating at a 2 Ghz, delivering 512 Gbps of data traffic in each direction simultaneously.

The NoC eliminates the congestion and performance bottlenecks that occur in traditional FPGAs that use the programmable routing and logic lookup table (LUT) resources to move data streams throughout the FPGA.

Speedster7t FPGAs confront the threat of third-party attacks with advanced bitstream security features with multiple layers of defense for protecting bitstream secrecy and integrity. Keys are encrypted based on a tamper-resistant physically unclonable function (PUF), and bitstreams are encrypted and authenticated by 256-bit AES-GCM. To defend against side-channel attacks, bitstreams are segmented, with separately derived keys used for each segment, and the decryption hardware employs differential power analysis (DPA) counter measures. Additionally, a 2048-bit RSA public key authentication protocol is used to activate the decryption and authentication hardware.

The Speedster7t FPGA devices range from 363K to 2.6M 6-input LUTs. The ACE design tools that support all of Achronix’s products including Speedcore eFPGA and Speedchip FPGA chiplets are available today.

The first devices and development boards for evaluation will be available in Q4 2019.

Speedster7t Product Table

Features	AC7t750	AC7t1500	AC7t3000	AC7t6000
6-input LUTs	363K	692K	1.3M	2.6M
LRAM2k	336	2,560	880	1,760
BRAM72k	1,344	2,560	2,600	5,200
MLP blocks	336	2,560	880	1,760
SerDes 112 Gbps (LR + XSR)	24 + 16	32 + 0	40 + 32	72 + 0
Dedicated GPIO	32	64	50	100
Additional GPIO	150	150	300	600
DDR4/5 channels	1	1	2	4
GDDR6	8 channels	16 channels †	16 channels	16 channels
PCIe Gen5	One ×16	One ×16 and one ×8	One ×16 and one ×8	Two ×16
Ethernet	8 lanes, 2×400G or 8×100G	16 lanes, 4×400G or 16×100G	16 lanes, 4×400G or 16×100G	32 lanes, 8×400G or 32×100G