NVIDIA Tesla V100 Available Later This Year
At this year's International Supercomputing Conference (ISC) NVIDIA announced a PCI Express version of their latest Tesla GPU accelerator, the Volta-based V100.
The Tesla V100 GPU accelerators combine AI and traditional HPC applications on a single platform and are projected to provide the U.S. Department of Energy's (DOE's) Summit supercomputer with 200 petaflops of 64-bit floating point performance and over 3 exaflops of AI performance when it comes online later this year.
NVIDIA is making new Tesla V100 GPU accelerators available in a PCIe form factor for standard servers, which will join the previously announced systems using NVIDIA NVLink interconnect technology.
Specifications of the PCIe form factor include:
- 7 teraflops double-precision performance, 14 teraflops single-precision performance and 112 teraflops half-precision performance with NVIDIA GPU BOOST technology
- 16GB of CoWoS HBM2 stacked memory, delivering 900GB/sec of memory bandwidth
- Support for PCIe Gen 3 interconnect (up to 32GB/sec bi-directional bandwidth)
- 250 watts of power
Tesla V100 PCIe | |
CUDA Cores | 5120 |
Tensor Cores | 640 |
Core Clock | Unknown |
Boost Clock | ~1370MHz |
Memory Clock | 1.75Gbps HBM2 |
Memory Bus Width | 4096-bit |
Memory Bandwidth | 900GB/sec |
VRAM | 16GB |
L2 Cache | 6MB |
Half Precision | 28 TFLOPS |
Single Precision | 14 TFLOPS |
Double Precision | 7 TFLOPS (1/2 rate) |
Tensor Performance (Deep Learning) |
112 TFLOPS |
GPU | GV100 (815mm2) |
Transistor Count | 21B |
TDP | 250W |
Form Factor | PCIe |
Cooling | Passive |
Manufacturing Process | TSMC 12nm FFN |
Architecture | Volta |
Like the previous Pascal iteration, the Tesla V100 PCIe allows vendors to drop Tesla cards in traditional PCIe systems, making the cards far more accessible to server builders who don't want to build around NVIDIA's SXM2 connector or carrier board. The tradeoff being that the PCIe cards have a lower 250W TDP, and they don't get NVLink.
Tesla V100 gets tensor cores on board, besides the CUDA cores also found in the company's Tesla P100 offerings. Tensor cores can be liked to a series of unified ALUs that are able to multiply two 4x4 FP16 matrices together and subsequently add that product to an FP16 or FP32 4x4 matrix in a fused multiply add operation, as opposed to conventional FP32 or FP64 CUDA cores. The result is a significant advantage of the 100+ TFLOPS in specific workloads.
NVIDIA Tesla V100 GPU accelerators for PCIe-based systems are expected to be available later this year from NVIDIA.