NVIDIA Tesla V100 Available Later This Year

At this year's International Supercomputing Conference (ISC) NVIDIA announced a PCI Express version of their latest Tesla GPU accelerator, the Volta-based V100.

The Tesla V100 GPU accelerators combine AI and traditional HPC applications on a single platform and are projected to provide the U.S. Department of Energy's (DOE's) Summit supercomputer with 200 petaflops of 64-bit floating point performance and over 3 exaflops of AI performance when it comes online later this year.

NVIDIA is making new Tesla V100 GPU accelerators available in a PCIe form factor for standard servers, which will join the previously announced systems using NVIDIA NVLink interconnect technology.

Specifications of the PCIe form factor include:

7 teraflops double-precision performance, 14 teraflops single-precision performance and 112 teraflops half-precision performance with NVIDIA GPU BOOST technology
16GB of CoWoS HBM2 stacked memory, delivering 900GB/sec of memory bandwidth
Support for PCIe Gen 3 interconnect (up to 32GB/sec bi-directional bandwidth)
250 watts of power

	Tesla V100 PCIe
CUDA Cores	5120
Tensor Cores	640
Core Clock	Unknown
Boost Clock	~1370MHz
Memory Clock	1.75Gbps HBM2
Memory Bus Width	4096-bit
Memory Bandwidth	900GB/sec
VRAM	16GB
L2 Cache	6MB
Half Precision	28 TFLOPS
Single Precision	14 TFLOPS
Double Precision	7 TFLOPS (1/2 rate)
Tensor Performance (Deep Learning)	112 TFLOPS
GPU	GV100 (815mm2)
Transistor Count	21B
TDP	250W
Form Factor	PCIe
Cooling	Passive
Manufacturing Process	TSMC 12nm FFN
Architecture	Volta

Like the previous Pascal iteration, the Tesla V100 PCIe allows vendors to drop Tesla cards in traditional PCIe systems, making the cards far more accessible to server builders who don't want to build around NVIDIA's SXM2 connector or carrier board. The tradeoff being that the PCIe cards have a lower 250W TDP, and they don't get NVLink.

Tesla V100 gets tensor cores on board, besides the CUDA cores also found in the company's Tesla P100 offerings. Tensor cores can be liked to a series of unified ALUs that are able to multiply two 4x4 FP16 matrices together and subsequently add that product to an FP16 or FP32 4x4 matrix in a fused multiply add operation, as opposed to conventional FP32 or FP64 CUDA cores. The result is a significant advantage of the 100+ TFLOPS in specific workloads.

NVIDIA Tesla V100 GPU accelerators for PCIe-based systems are expected to be available later this year from NVIDIA.