NVIDIA Unveils Full Power of Maxwell GPU Architecture With GeForce GTX 980, 970 Graphics Cards
NVIDIA today introduced the first high-end products based on its Maxwell chip architecture -- the new GeForce GTX 980 and 970 GPUs -- delivering performance, new graphics capabilities and twice the energy efficiency of the previous generation.
Maxwell is the company's 10th-generation GPU architecture, following Kepler. The new GM204 engine focuses on high-end performance for PC gaming and guarantee higher energy efficiency than previous models, Nvidia says. The GPUs will also allow Nvidia-powered machines to process game environments with higher resolution and better lighting.
Nvidia's new technology, Voxel Global Illumination (VXGI) enables gaming GPUs to deliver real-time dynamic global illumination for the first time. By applying the technology, PC games will be significantly improved, as the graphics will be more lifelike with the game's lighting interacting realistically within the environment. This results in a deeper level of immersion for gamers.
And a range of new technologies -- including multi-frame sampled antialiasing (MFAA), dynamic super resolution (DSR), VR Direct and an energy-efficient design -- enable Maxwell-based GTX 980 and 970 GPUs to render frames with the highest fidelity at higher clock speeds and lower power consumption than other GPU in their class.
Realistic lighting is among the most challenging problems faced in real-time graphics. Simulating both direct and indirect lighting, such as reflections for dynamic scenes, has to date been too computationally demanding for GPUs beyond those available to professionals. Game developers have been forced to use lighting tricks that compromise scene realism.
Maxwell enables developers to overcome these obstacles, combining the performance and programming capability required to model direct and indirect light sources. It does so by deploying VXGI, a new NVIDIA technique to accurately depict indirect lighting, including diffuse lighting, specular lighting and reflections.Nvidia pioneered a new form of global illumination called Voxel Cone Tracing (VCT). The aim is to improve the quality of in-game lighting without impacting seriously on the framerate. According to Nvidia, "the voxel can then be traced during the final rendering stage to accurately determine the effect of light bouncing around in the scene."
VCT-based global illumination was a feature in Unreal Engine 4, though it was dropped due to concerns regarding the impact on framerate. But Nvidia has added extra hardware within the GM204 Maxwell GPU to accelerate VCT further. The company claims Maxwell is able to speed-up this form of global illumination by a factor of three when compared to an older GeForce GPU, realising real-time VCT for the first time.
VXGI is being added to NVIDIA GameWorks, the company's game graphics library, so developers can build future games with dynamic environments filled with natural lighting and realism. It is being integrated into popular games engines like Unreal Engine 4, and will be available to developers later this year.
In addition, NVIDIA engineers have given the GTX 980 and 970 a further performance boost with a new technique called multi-frame sampled anti-aliasing (MFAA), which leverages new capabilities in Maxwell GPUs.
MFAA varies the anti-aliasing sample patterns across pixels both within an individual frame and between multiple frames. It then uses a newly developed synthesis filter to produce the best image quality and does so faster than conventional anti-aliasing. For gamers, MFAA yields image quality approaching that of 4xMSAA at the cost of 2Xmsaa.
This new, Maxwell-exclusive anti-aliasing technique improves upon the quality of MSAA, whilst simultaneously reducing the performance impact, enabling gamers to crank up rendering resolutions and game detail, and to activate DSR.
Previous-generation GPUs include fixed sample patterns for anti-aliasing (AA) that are stored in Read Only Memory (ROM). When gamers selected 2x or 4x MSAA for example, the pre-stored sample patterns were used. With Maxell, Nvidia has introduced programmable sample positions for rasterization that are stored on Random Access Memory (RAM), creating opportunities for new, more flexible, more inventive AA techniques that address the challenges of modern game engines, such as the increased performance cost of high-quality anti-aliasing.
Maxwell's new RAM-based sample position technology can still be programmed with standard MSAA and TXAA patterns, but now the driver or application may also load the RAM with custom positions that are free to vary from frame to frame, or even within a frame. And it is with this technology that we have developed Multi-Frame Sampled Anti-Aliasing (MFAA).
By alternating AA sample patterns both temporally and spatially, 4xMFAA has the performance cost of 2xMSAA, with anti-aliasing properties equivalent to 4xMSAA.
MFAA is still in development, but once finished it will improve frame rates and image quality in traditional games.
With Maxwell's Dynamic Super Resolution (DSR) technology, games can be rendered at 4K or other high-end resolutions and then scaled down to the native resolution on the user's display using a 13-tap Gaussian filter. The resulting image is much higher quality than simply rendering directly to 1080p.
DSR is enhancing any game that supports resolutions above 1920x1080. Simply put, it renders a game at a higher, more detailed resolution and intelligently shrinks the result back down to the resolution of your monitor, giving you 4K, 3840x2160-quality graphics on any screen.
Enthusiasts with compatible monitors refer to this process as Downsampling or Super Sampling. DSR drastically improves upon this process by applying a high-quality filter specifically designed for the task. DSR also makes the process simpler with on/off integration built directly into GeForce Experience and it's compatible with all monitors.
In Dark Souls II's opening scene, players find themselves surrounded by swaying grass. At 1920x1080, the grass flickers and scintillates heavily as it sways, and appears to be missing detail, as highlighted by our screen capture.
The 1920x1080 resolution lacks a sufficient number of sample points for the grass' fine detail. At 3840x2160 (4K), the number of sample points is multiplied by 4, enabling the game to capture and render more detail on each blade of grass. DSR applies a custom-made 13-tap Gaussian filter as the 4K image is scaled back down to 1920x1080 for display on the monitor:
DSR is automatically enabled through GeForce Experience, an NVIDIA application that automatically optimizes game settings for peak performance, downloads the latest drivers, and enables game streaming and in-game action capture. Gamers can turn on DSR with a single click of a button.
In addition, VR Direct technology incorporates a number of new features to increase performance, lower latency and increase compatibility for VR headsets. These features include:
- VR SLI -- provides heightened performance on virtual reality devices where multiple GPUs can be assigned a specific eye to render the stereo images faster.
- Asynchronous Warp -- cuts latency in half and adjusts images as gamers move their heads, without the need to re-render new frames.
- Auto Stereo -- improves game compatibility for VR devices, such as Oculus Rift, and allows users to play games on select headsets that weren't originally designed for VR.
The new 28nm GM204 GPU is weighing in at 5.2 billion transistors, with a die size of 398mm2. This compares to 3.54B transistors and a die size of 294mm2 for GK104, and 7.1B transistors and 551mm2 for GK110.
Let's have a look at the specs.
GTX 980 |
GTX 970 |
GTX 780 Ti |
GTX 770 |
|
Stream Processors | 2048 |
1664 |
2880 |
1536 |
Texture Units | 128 |
104 |
240 |
128 |
ROPs | 64 |
64 |
48 |
32 |
Core Clock | 1126MHz |
1050MHz |
875MHz |
1046MHz |
Boost Clock | 1216MHz |
1178MHz |
928Mhz |
1085MHz |
Memory Clock | 7GHz GDDR5 |
7GHz GDDR5 |
7GHz GDDR5 |
7GHz GDDR5 |
Memory Bus Width | 256-bit |
256-bit |
384-bit |
256-bit |
VRAM | 4GB |
4GB |
3GB |
2GB |
FP64 | 1/32 FP32 |
1/32 FP32 |
1/24 FP32 |
1/24 FP32 |
TDP | 165W |
145W |
250W |
230W |
Transistor Count | 5.2B |
5.2B |
7.1B |
3.5B |
Manufacturing Process | TSMC 28nm |
TSMC 28nm |
TSMC 28nm |
TSMC 28nm |
Launch Date | 09/18/14 |
09/18/14 |
11/07/13 |
05/30/13 |
Launch Price | $549 |
$329 |
$699 |
$399 |
Starting with the GeForce GTX 980, this is a fully enabled GM204 part. Its 16 SMMs are enabled (2048 CUDA cores), as are all 64 ROPs and the full 256-bit memory bus.
NVIDIA is shipping GTX 980 with a base clockspeed of 1126MHz, a boost clockspeed of 1216MHz. This is a higher set of clockspeeds than any NVIDIA consumer GPU thus far, surpassing GTX 770, GTX Titan Black, and GTX 750 Ti.
The memory clock stands at 7GHz, the same as with NVIDIA?s past generation of high-end cards. This 7GHz of GDDR5 is attached to a 256-bit memory bus, and is populated with 4GB of VRAM, veryuseful for 4K gaming.
The GTX 980 has a rated TDP of 165W -- significantly lower than the 250W TDPs of the GTX 780/780Ti/Titan and even the 225W TDP of the GTX 770.
Compared to GTX 980, GTX 970 drops 3 of the SMMs, reducing its final count to 13 SMMs or 1664 CUDA cores. It is otherwise keeping the same 64 ROPs and 256-bit memory bus as its bigger sibling.
SMMs clockspeed is reduced slightly for GTX 970. It ships at a base clockspeed of 1050MHz, with a boost clockspeed of 1178MHz. This puts the theoretical performance difference between it and the GTX 980 at about 97% of the ROP performance or about 79% of the shading/texturing/geometry performance. The memory configuration is unchanged from GTX 980 -- 4GB of GDDR5 clocked at 7GHz, all on a 256-bit bus.
The stock GTX 970 will be shipping with a TDP of just 145W, some 80W less than GTX 770. NVIDIA?s official designs include 2 6-pin PCIe power sockets.
For the physical design of the reference GeForce GTX 980 and GTX 970, NVIDIA reused the Titan'sdesign for the GTX 980. But the GTX 980 NVIDIA?s standard I/O configuration has changed, Nvidia dropped the second DL-DVI port, and in its place they have installed a pair of full size DisplayPorts. This brings the total I/O configuration up to 1x DL-DVI-I, 3x DisplayPort 1.2, and 1x HDMI 2.0. With 3 DisplayPorts NVIDIA can now drive 3 G-Sync monitors off of a single card, making G-Sync Surround viable for the first time. SLI connectors are still present, with the pair of connectors allowing up to quad-SLI.
GTX 980 will be launching at $550, meanwhile GTX 970 will be launching at the surprisingly low price of $329, some 40% cheaper than GTX 980.
GTX 980 is a challenger for the Radeon R9 290X, AMD?s flagship single-GPU card which is priced around $499. GTX 970?s competition meanwhile will be split between the Radeon R9 290 and Radeon R9 280X. From a performance perspective the R9 290 is going to be the closer competitor, though it's priced around $399. Meanwhile the R9 280X will undercut the GTX 970 at around $279, but with much weaker performance.
Nvidia said that there will be price cuts for the GTX 700 series. Although the GTX 760 stays in production with a new MSRP of $219, the GTX 770, GTX 780, and GTX 780 Ti will go on clearance sale meaning retailers will decide their prices.