New Features of DirectX 12 Promise Great Performance In Existing Hardware
Multi-threaded "command buffer recording" and "async shaders" are two big features of the base DirectX 12 specification, each harboring great potential to extract more performance and image quality out of existing hardware. Async shaders allows a game engine to execute GPU compute or memory activities during "gaps" in the graphics workload presented by a game.
While it seems sensible to allow the graphics, compute and memory functions of a GPU to operate simultaneously, past versions of DirectX did not provide for this functionality. Past versions of DirectX were essentially limited to a single, serial graphics queue for processing all types of workloads. Therefore graphics, compute and memory copy operations had to wait for other parts of the graphics queue to finish processing before springing to life and doing their work. This would often result in idle hardware for some portions of time, and idle hardware is squandered performance.
In contrast, DirectX 12 Async Shaders supercharge work completion in a compatible GPU by interleaving these tasks across multiple threads to shorten overall render time. Async Shaders are materially important to a PC gamer’s experience because shorter rendering times reduce graphics pipeline latency, and lower latency equals greater performance. "Performance" can mean higher framerates in gameplay and better responsiveness in VR environments. Further, finer levels of granularity in breaking up the workload can yield even greater reductions in work time.
The "command buffer" is a game’s "to-do list," a list of things that the CPU must reorganize and present to a graphics card so that graphics work can be done. Things on this to-do list might include lighting, placing characters, loading textures, generating reflections and more.
Modern PCs often ship with multi-core CPUs. One notable characteristic of DirectX 11-based applications is that many of these CPU cores in any multi-core CPU go partially or fully unutilized. This lack of utilization is owed to DirectX 11’s relative inability to break a game’s command buffer into small, parallel and computationally quick chunks that can be spread across many cores.
In addition to modest multi-threading in DirectX 11, a disproportionate amount of CPU time is frequently spent on driver and API interpretation ("overhead") under the DirectX 11 programming model, which leaves lesser time for executing game code that delivers quality and framerates.In DirectX 12, however, the command buffer behavior is overhauled in five key ways:
- Overhead is significantly reduced by moving driver and API code to any available CPU thread
- The absolute time required to complete complex CPU tasks is notably reduced
- Game workloads can be meaningfully distributed across >4 CPU cores
- New "bandwidth" on the CPU allows for higher peak draw calls, enabling more detailed game worlds
- All available CPU cores may now "talk" to the graphics card simultaneously
Much like going from a two-lane country road to an eight-lane superhighway, the shift to DirectX 12 allows more traffic from a processor to reach the graphics card in a shorter amount of time.
DirectX 12 will be part of Windows 10.