NVIDIA Unveils Tesla K80 Dual-GPU Accelerator For Data Analytics

NVIDIA has added the Tesla K80 dual-GPU accelerator to its Tesla Accelerated Computing Platform, designed for machine learning, data analytics, scientific, and high performance computing (HPC) applications. The new flagship offering combines the fastest GPU accelerators, the CUDA parallel computing model, and an ecosystem of software developers, software vendors, and datacenter system OEMs.

The Tesla K80 delivers nearly two times higher performance and double the memory bandwidth of its predecessor, the Tesla K40 GPU accelerator. According to Nvidia, it delivers up to 8.74 teraflops single-precision and up to 2.91 teraflops double-precision peak floating point performance, and 10 times higher performance than today's fastest CPUs on science and engineering applications, such as AMBER, GROMACS, Quantum Espresso and LSMS.

The accelerator features an enhanced version of NVIDIA GPU Boost technology, which dynamically converts power headroom into the optimal performance boost for each individual application. In addition, Dynamic Parallelism enables GPU threads to dynamically spawn new threads, enabling users to quickly crunch through adaptive and dynamic data structures.

It comes with two GPUs per board, 24GB of GDDR5 memory - 12GB of memory per GPU, offers a 480GB/s memory bandwidth and 4,992 CUDA parallel processing cores.

Shipping today, the NVIDIA Tesla K80 dual-GPU accelerator will be available from a variety of server manufacturers, including ASUS, Bull, Cirrascale, Cray, Dell, Gigabyte, HP, Inspur, Penguin, Quanta, Sugon, Supermicro and Tyan, as well as from NVIDIA reseller partners.

Features	Tesla K80	Tesla K40
GPU	2x Kepler GK210	1 Kepler GK110B
Peak double precision floating point performance	2.91 Tflops (GPU Boost Clocks) 1.87 Tflops (Base Clocks)	1.66 Tflops (GPU Boost Clocks) 1.43 Tflops (Base Clocks)
Peak single precision floating point performance	8.74 Tflops (GPU Boost Clocks) 5.6 Tflops (Base Clocks)	5 Tflops (GPU Boost Clocks) 4.29 Tflops (Base Clocks)
Memory bandwidth (ECC off)²	480 GB/sec (240 GB/sec per GPU)	288 GB/sec
Memory size (GDDR5)	24 GB (12GB per GPU)	12 GB
CUDA cores	4992 ( 2496 per GPU)	2880