Intel Claims GPUs Are Only Up To 14 Times Faster Than CPUs
Intel presented a technical paper where they showed that application kernels run up to 14 times faster on a NVIDIA
GeForce GTX 280 as compared with an Intel Core i7 960.
The paper, entitled "Debunking the 100X GPU vs. CPU myth:
an evaluation of throughput computing on CPU and GPU" was
presented by Intel at the International Symposium on
Computer Architecture (ISCA) in Saint-Malo, France.
Processing the ever-growing data in a timely manner has made throughput computing an important aspect for emerging applications. According to Intel's analysis of a set of important throughput computing kernels, there is an ample amount of parallelism in these kernels which makes them suitable for today's multi-core CPUs and GPUs.
In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, Intel performed a performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average.
In the paper, Intel also discussed optimization techniques for both CPU and GPU, analyze what architecture features contributed to performance differences between the two architectures, and recommend a set of architectural features which provide improvement in architectural efficiency for throughput kernels.
Commenting on the Intel's paper, Andy Keane Nvidia's General Manager GPU Computing wrote at the company' blog:
"It?s a rare day in the world of technology when a company you compete with stands up at an important conference and declares that your technology is *only* up to 14 times faster than theirs. In fact in all the 26 years I?ve been in this industry, I can?t recall another time I?ve seen a company promote competitive benchmarks that are an order of magnitude slower."
Keane said Intel used Nvidia's previous generation of GPU, the Nvidia GTX280 processor for the study and that the codes that were run on the GTX 280 were run right out-of-the-box, without any optimization. In fact, it?s actually unclear from the technical paper what codes were run and how they were compared between the GPU and CPU.
However, Keane admitted that "the 100x GPU vs CPU Myth" claim is true.
"Not *all* applications can see this kind of speed up, some just have to make do with an order of magnitude performance increase," he said. "But, 100X speed ups, and beyond, have been seen by hundreds of developers," he added, giving exanples developers that have achieved speed ups of more than 100x in their applications.
"The real myth here is that multi-core CPUs are easy for any developer to use and see performance improvements, Nvidia's representative said.
"Undergraduate students learning parallel programming at M.I.T. disputed this when they looked at the performance increase they could get from different processor types and compared this with the amount of time they needed to spend in re-writing their code. According to them, for the same investment of time as coding for a CPU, they could get more than 35x the performance from a GPU. Despite substantial investments in parallel computing tools and libraries, efficient multi-core optimization remains in the realm of experts like those Intel recruited for its analysis. In contrast, the CUDA parallel computing architecture from NVIDIA is a little over 3 years old and already hundreds of consumer, professional and scientific applications are seeing speedups ranging from 10 to 100x using NVIDIA GPUs."
Keane added that industry experts and the development community are voting by porting their applications to GPUs.
Interestingly enough, Nvidia's Chief Scientist Bill Dally received the 2010 Eckert-Mauchly Award for his pioneering work in architecture for parallel computing at the same event.
Processing the ever-growing data in a timely manner has made throughput computing an important aspect for emerging applications. According to Intel's analysis of a set of important throughput computing kernels, there is an ample amount of parallelism in these kernels which makes them suitable for today's multi-core CPUs and GPUs.
In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, Intel performed a performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average.
In the paper, Intel also discussed optimization techniques for both CPU and GPU, analyze what architecture features contributed to performance differences between the two architectures, and recommend a set of architectural features which provide improvement in architectural efficiency for throughput kernels.
Commenting on the Intel's paper, Andy Keane Nvidia's General Manager GPU Computing wrote at the company' blog:
"It?s a rare day in the world of technology when a company you compete with stands up at an important conference and declares that your technology is *only* up to 14 times faster than theirs. In fact in all the 26 years I?ve been in this industry, I can?t recall another time I?ve seen a company promote competitive benchmarks that are an order of magnitude slower."
Keane said Intel used Nvidia's previous generation of GPU, the Nvidia GTX280 processor for the study and that the codes that were run on the GTX 280 were run right out-of-the-box, without any optimization. In fact, it?s actually unclear from the technical paper what codes were run and how they were compared between the GPU and CPU.
However, Keane admitted that "the 100x GPU vs CPU Myth" claim is true.
"Not *all* applications can see this kind of speed up, some just have to make do with an order of magnitude performance increase," he said. "But, 100X speed ups, and beyond, have been seen by hundreds of developers," he added, giving exanples developers that have achieved speed ups of more than 100x in their applications.
"The real myth here is that multi-core CPUs are easy for any developer to use and see performance improvements, Nvidia's representative said.
"Undergraduate students learning parallel programming at M.I.T. disputed this when they looked at the performance increase they could get from different processor types and compared this with the amount of time they needed to spend in re-writing their code. According to them, for the same investment of time as coding for a CPU, they could get more than 35x the performance from a GPU. Despite substantial investments in parallel computing tools and libraries, efficient multi-core optimization remains in the realm of experts like those Intel recruited for its analysis. In contrast, the CUDA parallel computing architecture from NVIDIA is a little over 3 years old and already hundreds of consumer, professional and scientific applications are seeing speedups ranging from 10 to 100x using NVIDIA GPUs."
Keane added that industry experts and the development community are voting by porting their applications to GPUs.
Interestingly enough, Nvidia's Chief Scientist Bill Dally received the 2010 Eckert-Mauchly Award for his pioneering work in architecture for parallel computing at the same event.