NVIDIA Unveils Turing Architecture, Quadro RTX Ray-Tracing GPU
NVIDIA today launched the NVIDIA Turing GPU architecture, which features new RT Cores to accelerate ray tracing and new Tensor Cores for AI inferencing to make real-time ray tracing possible.
The company also unveiled its initial Turing-based products - the NVIDIA Quadro RTX 8000, Quadro RTX 6000 and Quadro RTX 5000 GPUs.
In what NVIDIA calls "the greatest leap since the invention of the CUDA GPU in 2006," Turing features new RT Cores and new Tensor Cores. These two engines - along with more powerful compute for simulation and enhanced rasterization - usher in a new generation of hybrid rendering to address the $250 billion visual effects industry. Hybrid rendering enables cinematic-quality interactive experiences, new effects powered by neural networks and fluid interactivity on highly complex models.
"This fundamentally changes how computer graphics will be done, it's a step change in realism," NVIDIA CEO Huang told an audience of more than 1,200 graphics pros gathered at the sleek glass and steel Vancouver Convention Center, which sits across a waterway criss-crossed by cruise ships and seaplanes from the stunning North Shore mountains.
Quadro RTX GPUs are designed for the most demanding visual computing workloads, and they far surpass the previous generation with technologies, including:
- New RT Cores to enable real-time ray tracing of objects and environments with physically accurate shadows, reflections, refractions and global illumination.
Turing Tensor Cores to accelerate deep neural network training and inference, which are critical to powering AI-enhanced rendering, products and services. - New Turing Streaming Multiprocessor architecture, featuring up to 4,608 CUDA cores, delivers up to 16 trillion floating point operations in parallel with 16 trillion integer operations per second to accelerate complex simulation of real-world physics.
- Advanced programmable shading technologies to improve the performance of complex visual effects.
- First implementation of ultra-fast Samsung 16Gb GDDR6 memory to support more complex designs, massive architectural datasets, 8K movie content and more. Samsung 16Gb GDDR6 memory doubles the device capacity of the company's 20-nanometer 8Gb GDDR5 memory. The new solution performs at a 14-gigabits-per-second (Gbps) pin speed with data transfers of 56 gigabytes per second (GB/s), which represents a 75 percent increase over 8Gb GDDR5 with its 8Gbps pin speed. Moreover, Samsung's GDDR6 consumes 35 percent less power than that required by the leading GDDR5 graphics solutions. The Samsung 16Gb GDDR6 operates at 1.35V compared to the 1.55V consumed by GDDR5 commonly found in the market today.
- NVIDIA NVLink to combine two GPUs with a high-speed link to scale memory capacity up to 96GB and drive higher performance with up to 100GB/s of data transfer.
- Hardware support for USB Type-C and VirtualLink, a new open industry standard being developed to meet the power, display and bandwidth demands of next-generation VR headsets through a single USB-C connector.
New technologies to improve performance of VR applications, including Variable Rate Shading, Multi-View Rendering and VRWorks Audio.
RTX 8000 | RTX 6000 | RTX 5000 | |
CUDA Cores | 4608 | 4608 | 3072 |
Tensor Cores | 576 | 576 | 384 |
Boost Clock | Unknown | ||
Memory Clock | 14Gbps GDDR6 | 14Gbps GDDR6 | 14Gbps GDDR6 |
Memory Bus Width | 384-bit | 384-bit | 256-bit |
VRAM | 48GB | 24GB | 16GB |
ECC | Unknown | ||
Single Precision | 16 TFLOPs | 16 TFLOPs | Unknown |
Tensor Performance | 500T OPs (INT4) |
500T OPs (INT4) |
|
Ray Performance | 10 GRays/s | 10 GRays/s | 6 GRays/s |
TDP | Unknown | ||
GPU | Unnamed Turing | Unnamed Turing | Unnamed Turing |
Architecture | Turing | Turing | Turing |
Launch Price | $10,000 | $6,300 | $2,300 |
The Quadro RTX 8000 and RTX 6000 both offer the same GPU performance and memory bandwidth thanks to the combination of 4608 CUDA cores, 576 tensor cores, and GDDR6 memory. The difference between the two is that the RTX 8000 is equipped with a full 48GB of VRAM, while the RTX 6000 has just 24GB of VRAM. The final card of the stack, the RTX 5000, offers a lower performance card for less money. This partially-enabled card gets 3072 CUDA cores, 384 tensor cores, and 16GB of GDDR6.
NVIDIA has confirmed that all of the cards come with 4 DisplayPort 1.4 outputs. Along with that, all of the cards will feature a 5th output: a VirtualLink-capable USB Type-C port.
The new Quadro RTX cards will be available in Q4 of this year. Prices start at $2,300 for the RTX 5000. An RTX 6000 will set buyers back $6,300, while the flagship RTX 8000 will be a full $10,000.
Dell EMC, HP, Inc., Hewlett Packard Enterprise, Lenovo, Fujitsu, Boxx and SuperMicro will be among the system vendors supporting the latest line of Quadro processors, NVIDIA's Juang said. The three new Quadro GPUs will be available starting in the fourth quarter.
Huang also announced that NVIDIA is open sourcing its Material Definition Language software development kit, starting today.
Turing's dedicated ray-tracing processors - called RT Cores - accelerate the computation of how light and sound travel in 3D environments. Turing accelerates real-time ray tracing operations by 25x over the previous Pascal generation. It can be used for final-frame rendering for film effects at more than 30x the speed of CPUs.
To demonstrate this, Huang showed the audience a demo they'd seen - Epic Games' stunning Star Wars-themed Reflections ray-tracing demo - running on hardware they hadn't. At the Game Developers Conference in March, Reflections ran on a $70,000 DGX Station equipped with four Volta GPUs. This time the demo ran on a single Turing GPU.
"It turns out it was running on this - one single GPU," Huang said to wild applause as he playfully blinded the camera by angling the gleaming Quadro RTX 8000's reflective outer shroud. "This is the world's first ray-tracing GPU."
At the same time, the Turing architecture's Tensor Cores - processors that accelerate deep learning training and inferencing - provide up to 500 trillion tensor operations a second. This, in turn, powers AI-enhanced features - such as denoising, resolution scaling and video re-timing - included in the NVIDIA NGX software development kit.
"At some point you can use AI or some heuristics to figure out what are the missing dots and how should we fill it all in, and it allows us to complete the frame a lot faster than we otherwise could," Huang said, describing the new deep learning-powered technology stack that enables developers to integrate accelerated, enhanced graphics, photo imaging and video processing into applications with pre-trained networks.
Turing also cranks through rasterization - the mainstay of interactive graphics - 6x faster than Pascal, Huang said, detailing how technologies such as variable-rate shading, texture-space shading and multi-view rendering will provide for more fluid interactivity with large models and scenes and improved VR experiences.
Turning to a tested graphics teaching tool, Huang told the story of how visual effects have progressed by using the Cornell Box - a 3D box inside which various objects are displayed. Huang showed off how Turing uses ray tracing to deliver complex effects - ranging from diffused reflection to refractions to caustics to global illumination - with stunning photorealism.
Huang also showed off a video featuring a Porsche concept car - illuminated by lights that played across its undulating curves - celebrating the automaker's 70th anniversary. While the photoreal demo looks filmed, it's entirely generated on a Turing GPU running Epic Games' Unreal Engine.
In addition to three powerful Turing-powered graphics cards - the $2,300 Quadro RTX 5000, $6,300 Quadro RTX 6000 and $10,000 Quadro RTX 8000 - Huang also introduced the Quadro RTX Server.
Equipped with eight Turing GPUs, it's designed to slash rendering times from hours to minutes. Four 8-GPU RTX Servers can do the rendering work of 240 dual-core servers at 1/4th the cost, using 1/10 the space and consuming 1/11th the power. "Instead of a shot taking five hours or six hours, it now takes just one hour," Huang said. "It's going to completely change how people do film."