Microsoft today officially introduced the DirectX 12 at the annual Game Developers Conference (GDC) in San Francisco. DX12 is Microsoft's latest version of the graphics API and was described as a major stride for gaming.
Speaking to developers and press, Anuj Gosalia, development manager of DirectX at Microsoft, described DX12 as the joint effort of hardware vendors, game developers and his team.
So what's so special about DirectX 12? It introduces the next version of Direct3D, the graphics API at the heart of DirectX. Microsoft has redesigned Direct3D to be faster and more efficient than before. Direct3D 12 enables richer scenes, more objects, and full utilization of modern GPU hardware. And it isn’t just for high-end gaming PCs either – Direct3D 12 works across all the Microsoft devices - phones and tablets, to laptops and desktops, and Xbox One.
According to Microsoft, Direct3D 12 provides a lower level of hardware abstraction than ever before, allowing games to significantly improve multithread scaling and CPU utilization. In addition, games will benefit from reduced GPU overhead via features such as descriptor tables and concise pipeline state objects. Direct3D 12 also introduces a set of new rendering pipeline features that will siginificantly improve the efficiency of algorithms such as order-independent transparency, collision detection, and geometry culling.
DirectX 12 will also contain tools for Direct3D, available immediately when Direct3D 12 is released - Microsoft is targeting Holiday 2015 games.
Microsoft says that DirectX 12 will run on many of the cards gamers already have. Nvidia said that it would support it on all the DX11-class GPUs it has shipped; these belong to the Fermi, Kepler and Maxwell architectural families. AMD said all of its Graphics Core Next-based Radeon GPUs ( Radeon HD 7000 series and newer) will work with the new API. Finally, Intel said the integrated graphics in its existing Haswell processors will also have DX12 support.
"AMD strongly believes in the benefits gamers and game developers can realize from lower-overhead API development," said Matt Skynner, corporate vice president and general manager, Graphics Business Unit, AMD. "With the Mantle API, AMD has shown the world our commitment to incredible performance, and we look forward to enabling the same performance gains by supporting the industry-standard DirectX 12."
But Nvidia's Senior Vice President of Content and Technology, Tony Tamashi, took some shots at AMD's similar Mantle initiative, saying they're excited about DirectX 12 because it supports existing goals "within the framework of existing graphics APIs," without the need to fragment the community.
The screenshots below are from real Direct3D 12 app code running on a real Direct3D 12 runtime running on a real Direct3D 12 driver. 3DMark on Direct3D 11 uses multi-threading extensively, however due to a combination of runtime and driver overhead, there is still significant idle time on each core. After porting the benchmark to use Direct3D 12, you see two major improvements – a 50% improvement in CPU utilization, and better distribution of work among threads:
At GDC, Gosalia demonstrated the new API with a tech demo of the Xbox One racing game Forza running on a PC powered by an Nvidia Titan Black GPU. Under the hood, Forza achieves this by using the efficient low-level APIs already available on Xbox One today. Traditionally this level of efficiency was only available on console – now, Direct3D 12, even in an alpha state, brings this efficiency to PC and Phone as well. By porting their Xbox One Direct3D 11.X core rendering engine to use Direct3D 12 on PC, Turn 10 was able to bring that console-level efficiency to their PC tech demo:
DX12 will span PCs, XBox One, tablets and even phones. Today's debut focused on the form of the graphics API, the model. Future Direct3D releases will include new rendering features, in addition to the new driver/application model outlined today.
Where does this performance come from?
As it was previously noted by Microsoft, Direct3D 12 represents a significant departure from the Direct3D 11 programming model, allowing apps to go "closer to the metal" than ever before. Microsoft accomplished this by overhauling numerous areas of the API: pipeline state representation, work submission, and resource access.
Direct3D 11 allows pipeline state manipulation through a large set of orthogonal objects. For example, input assembler state, pixel shader state, rasterizer state, and output merger state are all independently modifiable. This provides a relatively high-level representation of the graphics pipeline, however it doesn’t map very well to modern hardware. This is primarily because there are often interdependencies between the various states. For example, many GPUs combine pixel shader and output merger state into a single hardware representation, but because the Direct3D 11 API allows these to be set separately, the driver cannot resolve things until it knows the state is finalized, which isn’t until draw time. This delays hardware state setup, which means extra overhead, and fewer maximum draw calls per frame.
Direct3D 12 addresses this issue by unifying much of the pipeline state into immutable pipeline state objects (PSOs), which are finalized on creation. This allows hardware and drivers to immediately convert the PSO into whatever hardware native instructions and state are required to execute GPU work. Which PSO is in use can still be changed dynamically, but to do so the hardware only needs to copy the minimal amount of pre-computed state directly to the hardware registers, rather than computing the hardware state on the fly. This means significantly reduced draw call overhead, and many more draw calls per frame.
In Direct3D 11, all work submission is done via the immediate context, which represents a single stream of commands that go to the GPU. To achieve multithreaded scaling, games also have deferred contexts available to them, but like PSOs, deferred contexts also do not map perfectly to hardware, and so relatively little work can be done in them.
In addition, Direct3D 12 introduces a new model for work submission based on command lists that contain the entirety of information needed to execute a particular workload on the GPU. Each new command list contains information such as which PSO to use, what texture and buffer resources are needed, and the arguments to all draw calls. Because each command list is self-contained and inherits no state, the driver can pre-compute all necessary GPU commands up-front and in a free-threaded manner. The only serial process necessary is the final submission of command lists to the GPU via the command queue, which is a highly efficient process.
In addition to command lists, Direct3D 12 also introduces a second level of work pre-computation, bundles. Unlike command lists which are completely self-contained and typically constructed, submitted once, and discarded, bundles provide a form of state inheritance which permits reuse. For example, if a game wants to draw two character models with different textures, one approach is to record a command list with two sets of identical draw calls. But another approach is to "record" one bundle that draws a single character model, then "play back" the bundle twice on the command list using different resources. In the latter case, the driver only has to compute the appropriate instructions once, and creating the command list essentially amounts to two low-cost function calls.
Resource binding in Direct3D 11 is highly abstracted and convenient, but leaves many modern hardware capabilities underutilized. In Direct3D 11, games create "view" objects of resources, then bind those views to several "slots" at various shader stages in the pipeline. Shaders in turn read data from those explicit bind slots which are fixed at draw time. This model means that whenever a game wants to draw using different resources, it must re-bind different views to different slots, and call draw again. This is yet another case of overhead that can be eliminated by fully utilizing modern hardware capabilities.
Direct3D 12 changes the binding model to match modern hardware and significantly improve performance. Instead of requiring standalone resource views and explicit mapping to slots, Direct3D 12 provides a descriptor heap into which games create their various resource views. This provides a mechanism for the GPU to directly write the hardware-native resource description (descriptor) to memory up-front. To declare which resources are to be used by the pipeline for a particular draw call, games specify one or more descriptor tables which represent sub-ranges of the full descriptor heap. As the descriptor heap has already been populated with the appropriate hardware-specific descriptor data, changing descriptor tables is an extremely low-cost operation.
In addition to the improved performance offered by descriptor heaps and tables, Direct3D 12 also allows resources to be dynamically indexed in shaders, providing flexibility and unlocking new rendering techniques. As an example, modern deferred rendering engines typically encode a material or object identifier of some kind to the intermediate g-buffer. In Direct3D 11, these engines must be careful to avoid using too many materials, as including too many in one g-buffer can significantly slow down the final render pass. With dynamically indexable resources, a scene with a thousand materials can be finalized just as quickly as one with only ten.