Caught Up to DirectX 12 in a Single Day

The wait for Vulkan is over.

I'm not just talking about the specification. Members of the Khronos Group have also released compatible drivers, SDKs and tools to support them, conformance tests, and a proof-of-concept patch for Croteam's The Talos Principle. To reiterate, this is not a soft launch. The API, and its entire ecosystem, is out and ready for the public on Windows (at least 7+ at launch but a surprise Vista or XP announcement is technically possible) and several distributions of Linux. Google will provide an Android SDK in the near future.


I'm going to editorialize for the next two paragraphs. There was a concern that Vulkan would be too late. The thing is, as of today, Vulkan is now just as mature as DirectX 12. Of course, that could change at a moment's notice; we still don't know how the two APIs are being adopted behind the scenes. A few DirectX 12 titles are planned to launch in a few months, but no full, non-experimental, non-early access game currently exists. Each time I say this, someone links the Wikipedia list of DirectX 12 games. If you look at each entry, though, you'll see that all of them are either: early access, awaiting an unreleased DirectX 12 patch, or using a third-party engine (like Unreal Engine 4) that only list DirectX 12 as an experimental preview. No full, released, non-experimental DirectX 12 game exists today. Besides, if the latter counts, then you'll need to accept The Talos Principle's proof-of-concept patch, too.

But again, that could change. While today's launch speaks well to the Khronos Group and the API itself, it still needs to be adopted by third party engines, middleware, and software. These partners could, like the Khronos Group before today, be privately supporting Vulkan with the intent to flood out announcements; we won't know until they do... or don't. With the support of popular engines and frameworks, dependent software really just needs to enable it. This has not happened for DirectX 12 yet, and, now, there doesn't seem to be anything keeping it from happening for Vulkan at any moment. With the Game Developers Conference just a month away, we should soon find out.


But back to the announcement.

Vulkan-compatible drivers are launching today across multiple vendors and platforms, but I do not have a complete list. On Windows, I was told to expect drivers from NVIDIA for Windows 7, 8.x, 10 on Kepler and Maxwell GPUs. The standard is compatible with Fermi GPUs, but NVIDIA does not plan on supporting the API for those users due to its low market share. That said, they are paying attention to user feedback and they are not ruling it out, which probably means that they are keeping an open mind in case some piece of software gets popular and depends upon Vulkan. I have not heard from AMD or Intel about Vulkan drivers as of this writing, one way or the other. They could even arrive day one.

On Linux, NVIDIA, Intel, and Imagination Technologies have submitted conformant drivers.

Drivers alone do not make a hard launch, though. SDKs and tools have also arrived, including the LunarG SDK for Windows and Linux. LunarG is a company co-founded by Lens Owen, who had a previous graphics software company that was purchased by VMware. LunarG is backed by Valve, who also backed Vulkan in several other ways. The LunarG SDK helps developers validate their code, inspect what the API is doing, and otherwise debug. Even better, it is also open source, which means that the community can rapidly enhance it, even though it's in a releasable state as it is. RenderDoc, the open-source graphics debugger by Crytek, will also add Vulkan support. ((Update (Feb 16 @ 12:39pm EST): Baldur Karlsson has just emailed me to let me know that it was a personal project at Crytek, not a Crytek project in general, and their GitHub page is much more up-to-date than the linked site.))


The major downside is that Vulkan (like Mantle and DX12) isn't simple.
These APIs are verbose and very different from previous ones, which requires more effort.

Image Credit: NVIDIA

There really isn't much to say about the Vulkan launch beyond this. What graphics APIs really try to accomplish is standardizing signals that enter and leave video cards, such that the GPUs know what to do with them. For the last two decades, we've settled on an arbitrary, single, global object that you attach buffers of data to, in specific formats, and call one of a half-dozen functions to send it.

Compute APIs, like CUDA and OpenCL, decided it was more efficient to handle queues, allowing the application to write commands and send them wherever they need to go. Multiple threads can write commands, and multiple accelerators (GPUs in our case) can be targeted individually. Vulkan, like Mantle and DirectX 12, takes this metaphor and adds graphics-specific instructions to it. Moreover, GPUs can schedule memory, compute, and graphics instructions at the same time, as long as the graphics task has leftover compute and memory resources, and / or the compute task has leftover memory resources.

This is not necessarily a “better” way to do graphics programming... it's different. That said, it has the potential to be much more efficient when dealing with lots of simple tasks that are sent from multiple CPU threads, especially to multiple GPUs (which currently require the driver to figure out how to convert draw calls into separate workloads -- leading to simplifications like mirrored memory and splitting workload by neighboring frames). Lots of tasks aligns well with video games, especially ones with lots of simple objects, like strategy games, shooters with lots of debris, or any game with large crowds of people. As it becomes ubiquitous, we'll see this bottleneck disappear and games will not need to be designed around these limitations. It might even be used for drawing with cross-platform 2D APIs, like Qt or even webpages, although those two examples (especially the Web) each have other, higher-priority bottlenecks. There are also other benefits to Vulkan.


The WebGL comparison is probably not as common knowledge as Khronos Group believes.
Still, Khronos Group was criticized when WebGL launched as "it was too tough for Web developers".
It didn't need to be easy. Frameworks arrived and simplified everything. It's now ubiquitous.
In fact, Adobe Animate CC (the successor to Flash Pro) is now a WebGL editor (experimentally).

Open platforms are required for this to become commonplace. Engines will probably target several APIs from their internal management APIs, but you can't target users who don't fit in any bucket. Vulkan brings this capability to basically any platform, as long as it has a compute-capable GPU and a driver developer who cares.

Thankfully, it arrived before any competitor established market share.

GDC 2016 Sessions Are Up and DirectX 12 / Vulkan Are There

The 30th Game Developers Conference (GDC) will take place on March 14th through March 18th, with the expo itself starting on March 16th. The sessions have been published at some point, with DX12 and Vulkan prominently featured. While the technologies have not been adopted as quickly as advertised, the direction is definitely forward. In fact, NVIDIA, Khronos Group, and Valve have just finished hosting a developer day for Vulkan. It is coming.


One interesting session will be hosted by Codemasters and Intel, which discusses bringing the F1 2015 engine to DirectX 12. It will highlight a few features they implemented, such as voxel based raytracing using conservative rasterization, which overestimates the size of individual triangles so you don't get edge effects on pixels that are partially influenced by an edge that cuts through a tiny, but not negligible, portion of them. Sites like Game Debate (Update: Whoops, forgot the link) wonder if these features will be patched in to older titles, like F1 2015, or if they're just R&D for future games.

Another keynote will discuss bringing Vulkan to mobile through Unreal Engine 4. This one will be hosted by ARM and Epic Games. Mobile processors have quite a few cores, albeit ones that are slower at single-threaded tasks, and decent GPUs. Being able to keep them loaded will bring their gaming potential up closer to the GPU's theoretical performance, which has surpassed both the Xbox 360 and PlayStation 3, sometimes by a factor of 2 or more.

Many (most?) slide decks and video recordings are available for free after the fact, but we can't really know which ones ahead of time. It should be an interesting year, though.

Vulkan API Slips to 2016

The Khronos Group announced on Friday that the Vulkan API will not ship until next year. The standards body was expecting to launch it at some point in 2015. In fact, when I was first briefed on it, they specifically said that 2015 was an “under-promise and over-deliver” estimate. Vulkan is an open graphics and compute standard that was derived from AMD's Mantle. It, like OpenCL 2.1, uses the SPIR-V language for compute and shading though, which can be compiled from subsets of a variety of languages.


I know that most people will be quick to blame The Khronos Group for this, because industry bodies moving slowly is a stereotype, but I don't think it applies. When AMD created Mantle, it bore some significant delays at all levels. Its drivers and software were held back, and the public release of its SDK was delayed out of existence. Again, it would be easy to blame AMD for this, but hold on. We now get to Microsoft. DirectX 12, which is maybe even closer to Mantle than Vulkan is due to its shading language, didn't roll out as aggressively as Microsoft expected, either. Software is still pretty much non-existent when they claimed, at GDC 2014, that about 50% of PC games would be DX12-compatible by Holiday 2015. We currently have... ... zero (excluding pre-release).

Say what you like about the three examples individually, but when all three show problems, then there might just be a few issues that took longer than expected to solve. Again, this is a completely different metaphor of translating voltages coming through a PCI Express bus into fancy graphics and GPU compute, and create all of the supporting ecosystems, too.

Speaking of ecosystems, The Khronos Group has also announced that Google has upgraded their membership to “Promoter” to get more involved with Vulkan development. Google has been sort-of hostile towards certain standards from The Khronos Group on Android in the past, such as disabling OpenCL on Nexus devices, and trying to steer developers into using Android Extension Pack and Renderscript. They seem to want to use Vulkan proper this time, which is always healthy for the API.

I guess look forward to Vulkan in 2016... hopefully early.

AMD Releases Catalyst 15.10 Beta Drivers

The AMD Catalyst 15.9 beta driver was released just two weeks ago, and already AMD is ready with a new version. 15.10 is available now and offers several bug fixes, though the point of emphasis is DX12 performance improvements to the Ashes of the Singularity benchmark.

From AMD:

Highlights of AMD Catalyst 15.10 Beta Windows Driver

Performance Optimizations:

  • Ashes of the Singularity - DirectX 12 Quality and Performance optimizations

Resolved Issues:

  • Video playback of MPEG2 video fails with a playback error/error code message
  • A TDR error or crash is experienced when running the Unreal Engine 4 DirectX benchmark
  • Star Wars: Battlefront is able to use high performance graphics when launched on mobile devices with switchable graphics
  • Intermittent playback issues with Cyberlink PowerDVD when connecting to a 3D display with an HDMI cable
  • Ashes of the Singularity - A 'Driver has stopped responding' error may be experienced in DirectX 12 mode
  • Driver installation may halt on some configurations
  • A TDR error may be experienced while toggling between minimized and maximized mode while viewing 4K YouTube content

Known Issues:

  • Ashes of the Singularity may crash on some AMD 300 series GPUs
  • Core clock fluctuations may be experienced when FreeSync and FRTC are both enabled on some AMD CrossFire systems
  • Ashes of the Singularity may fail to launch on some GPUs with 2GB Video Memory. AMD continues to work with Stardock to resolve the issue. In the meantime, deleting the game config file helps resolve the issue
  • The secondary display adapter is missing in the Device Manager and the AMD Catalyst Control Center after installing the driver on a Microsoft Windows 8.1 system
  • Elite: Dangerous - poor performance may be experienced in SuperCruise mode
  • A black screen may be encountered on bootup on Windows 10 systems. The system will ultimately continue to the Windows login screen

The driver is available now from AMD's Catalyst beta download page.

NVIDIA Publishes DirectX 12 Tips for Developers

Programming with DirectX 12 (and Vulkan, and Mantle) is a much different process than most developers are used to. The biggest change is how work is submit to the driver. Previously, engines would bind attributes to a graphics API and issue one of a handful of “draw” commands, which turns the current state of the API into a message. Drivers would play around with queuing them and manipulating them, to optimize how these orders are sent to the graphics device, but the game developer had no control over that.


Now, the new graphics APIs are built more like command lists. Instead of bind, call, bind, call, and so forth, applications request queues to dump work into, and assemble the messages themselves. It even allows these messages to be bundled together and sent as a whole. This allows direct control over memory and the ability to distribute a lot of the command control across multiple CPU cores. Applications are only as fast as its slowest (relevant) thread, so the ability to spread work out increases actual performance.

NVIDIA has created a large list of things that developers should do, and others that they should not, to increase performance. Pretty much all of them apply equally, regardless of graphics vendor, but there are a few NVIDIA-specific comments, particularly the ones about NvAPI at the end and a few labeled notes in the “Root Signatures” category.

The tips are fairly diverse, covering everything from how to efficiently use things like command lists, to how to properly handle multiple GPUs, and even how to architect your engine itself. Even if you're not a developer, it might be interesting to look over to see how clues about what makes the API tick.

Benchmark Overview

I knew that the move to DirectX 12 was going to be a big shift for the industry. Since the introduction of the AMD Mantle API along with the Hawaii GPU architecture we have been inundated with game developers and hardware vendors talking about the potential benefits of lower level APIs, which give more direct access to GPU hardware and enable more flexible threading for CPUs to game developers and game engines. The results, we were told, would mean that your current hardware would be able to take you further and future games and applications would be able to fundamentally change how they are built to enhance gaming experiences tremendously.

I knew that the reader interest in DX12 was outstripping my expectations when I did a live blog of the official DX12 unveil by Microsoft at GDC. In a format that consisted simply of my text commentary and photos of the slides that were being shown (no video at all), we had more than 25,000 live readers that stayed engaged the whole time. Comments and questions flew into the event – more than me or my staff could possible handle in real time. It turned out that gamers were indeed very much interested in what DirectX 12 might offer them with the release of Windows 10.


Today we are taking a look at the first real world gaming benchmark that utilized DX12. Back in March I was able to do some early testing with an API-specific test that evaluates the overhead implications of DX12, DX11 and even AMD Mantle from Futuremark and 3DMark. This first look at DX12 was interesting and painted an amazing picture about the potential benefits of the new API from Microsoft, but it wasn’t built on a real game engine. In our Ashes of the Singularity benchmark testing today, we finally get an early look at what a real implementation of DX12 looks like.

And as you might expect, not only are the results interesting, but there is a significant amount of created controversy about what those results actually tell us. AMD has one story, NVIDIA another and Stardock and the Nitrous engine developers, yet another. It’s all incredibly intriguing.

It's Basically a Function Call for GPUs

Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.

While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.

Some developers experienced gains, while others lost a bit. It didn't live up to expectations.


The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.


The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:

  • Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
  • Both AMD and NVIDIA saw an order of magnitude increase in draw calls

Read on to see what this means for games and game development.

Podcast #360 - Intel XPoint Memory, Windows 10 and DX12, FreeSync displays and more!

PC Perspective Podcast #360 - 07/30/2015

Join us this week as we discuss Intel XPoint Memory, Windows 10 and DX12, FreeSync displays and more!

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Sebastian Peak

... But Is the Timing Right?

Windows 10 is about to launch and, with it, DirectX 12. Apart from the massive increase in draw calls, Explicit Multiadapter, both Linked and Unlinked, has been the cause of a few pockets of excitement here and there. I am a bit concerned, though. People seem to find this a new, novel concept that gives game developers the tools that they've never had before. It really isn't. Depending on what you want to do with secondary GPUs, game developers could have used them for years. Years!

Before we talk about the cross-platform examples, we should talk about Mantle. It is the closest analog to DirectX 12 and Vulkan that we have. It served as the base specification for Vulkan that the Khronos Group modified with SPIR-V instead of HLSL and so forth. Some claim that it was also the foundation of DirectX 12, which would not surprise me given what I've seen online and in the SDK. Allow me to show you how the API works.


Mantle is an interface that mixes Graphics, Compute, and DMA (memory access) into queues of commands. This is easily done in parallel, as each thread can create commands on its own, which is great for multi-core processors. Each queue, which are lists leading to the GPU that commands are placed in, can be handled independently, too. An interesting side-effect is that, since each device uses standard data structures, such as IEEE754 decimal numbers, no-one cares where these queues go as long as the work is done quick enough.

Since each queue is independent, an application can choose to manage many of them. None of these lists really need to know what is happening to any other. As such, they can be pointed to multiple, even wildly different graphics devices. Different model GPUs with different capabilities can work together, as long as they support the core of Mantle.


DirectX 12 and Vulkan took this metaphor so their respective developers could use this functionality across vendors. Mantle did not invent the concept, however. What Mantle did is expose this architecture to graphics, which can make use of all the fixed-function hardware that is unique to GPUs. Prior to AMD's usage, this was how GPU compute architectures were designed. Game developers could have spun up an OpenCL workload to process physics, audio, pathfinding, visibility, or even lighting and post-processing effects... on a secondary GPU, even from a completely different vendor.

Vista's multi-GPU bug might get in the way, but it was possible in 7 and, I believe, XP too.

Read on to see a couple reasons why we are only getting this now...

Computex 2015: EVGA Builds PrecisionX 16 with DirectX 12 Support

Subject: Graphics Cards | June 1, 2015 - 10:58 AM |
Tagged: evga, precisionx, dx12, DirectX 12

Another interesting bit of news surrounding Computex and the new GTX 980 Ti comes from EVGA and its PrecisionX software. This is easily our favorite tool for overclocking and GPU monitoring, so it's great to see the company continuing to push forward with features and capability. EVGA is the first to add full support for DX12 with an overlay.


What does that mean? It means as DX12 applications that find their way out to consumers and media, we will now have a tool that can help measure performance and monitor GPU speeds and feeds via the PrecisionX overlay. Before this release, we were running the dark with DX12 demos, so this is great news!

You can download the latest version over on EVGA's website!