It’s Basically a Function Call for GPUs

… and what do they mean for actual performance?

Mantle, Vulkan, and DirectX 12 all claim to reduce overhead and provide a staggering increase in “draw calls”. As mentioned in the previous editorial, loading graphics card with tasks will take a drastic change in these new APIs. With DirectX 10 and earlier, applications would assign attributes to (what it is told is) the global state of the graphics card. After everything is configured and bound, one of a few “draw” functions is called, which queues the task in the graphics driver as a “draw call”.

While this suggests that just a single graphics device is to be defined, which we also mentioned in the previous article, it also implies that one thread needs to be the authority. This limitation was known about for a while, and it contributed to the meme that consoles can squeeze all the performance they have, but PCs are “too high level” for that. Microsoft tried to combat this with “Deferred Contexts” in DirectX 11. This feature allows virtual, shadow states to be loaded from secondary threads, which can be appended to the global state, whole. It was a compromise between each thread being able to create its own commands, and the legacy decision to have a single, global state for the GPU.

Some developers experienced gains, while others lost a bit. It didn't live up to expectations.

The paradigm used to load graphics cards is the problem. It doesn't make sense anymore. A developer might not want to draw a primitive with every poke of the GPU. At times, they might want to shove a workload of simple linear algebra through it, while other requests could simply be pushing memory around to set up a later task (or to read the result of a previous one). More importantly, any thread could want to do this to any graphics device.

The new graphics APIs allow developers to submit their tasks quicker and smarter, and it allows the drivers to schedule compatible tasks better, even simultaneously. In fact, the driver's job has been massively simplified altogether. When we tested 3DMark back in March, two interesting things were revealed:

  • Both AMD and NVIDIA are only a two-digit percentage of draw call performance apart
  • Both AMD and NVIDIA saw an order of magnitude increase in draw calls

Read on to see what this means for games and game development.

The number of simple draw calls that a graphics card can process in a second does not have a strong effect on overall performance. If the number of draw calls in the DirectX 12 results are modeled as a latency, which is not the best way to look at it but it helps illustrate a point, then a 10% performance difference is about five nanoseconds (per task). This amount of time is probably small compared to how long the actual workload takes to process. In multi-threaded DirectX 11, NVIDIA held a lead over AMD by about 162% more calls. This almost three-fold increase in draws, which is a precious resource in DirectX 11, was evaporated in DirectX 12. In fact, it was AMD who held about a 23% lead in that API, although DX12 calls are more plentiful than they were in DX11. Are draw calls no longer a bottleneck in DirectX 12, though? We'll see.

If they're able to see the whole level, that's ~9000 draw calls.

Many can be instanced together, but that's added effort and increases GPU load.

This brings us to the second point: both vendors saw an order of magnitude increase in draw calls. When this happens, developers can justify solving their problems with smaller, more naive tasks. This might be able to either save real development time that would be spent on optimization if DX11 can be ignored, or it may allow a whole new bracket of cosmetic effects for compatible systems. This is up to individual developers, and it depends on how much real-world relief it brings.

A couple of months ago, I talked to a “AAA” game developer about this. He was on the business side, so I focused the conversation on how the new APIs would affect corporate structure.

I asked whether this draw call increase would trickle into the art department and asset creation. Specifically, I inquired whether the reduced overhead would allow games to be made on smaller art budgets, and/or this permit larger games on the same budget. Hypothetically, due to the decrease in person-hours required to optimize (or sometimes outright fake) complex scenes, the artists would spend less time on the handful of difficult assets that require, for instance, multiple materials or duplications of skeletal meshes, each of which are often separate draw calls. For instance, rather than spawning a flock of individual birds, an artist could create a complex skeleton animation for the entire flock to get it in one draw call. This takes more time to create, and it will consume extra GPU resources used to store and animate that hack too, which means you will probably need to spend even more time elsewhere to pay that debt.

A nine-bone skeleton even looks like a terrible way to animate three book-shaped birds.

But… it's one draw call.

This apparently wasn't something that the representative thought much about but, as he pondered about it for a few moments, he said that he could see it leading to more content within the same art budget. This hesitation surprised me a bit, but that could have just been the newness of the question itself. I would have expected that it would have already influenced human resource decisions if my hypothesis was true, which wouldn't require time to reflect upon.

But other studios might be thinking of it.

Ubisoft's CEO mentioned in an investor call that Assassin's Creed: Unity was the product of redoing their entire engine. Graphics vendors state that amazing PC developers should be able to push about 10,000 to 20,000 draw calls per frame with comfortable performance. This Assassin's Creed, on the other hand, was rumored to be pushing upwards of 50,000 at some points, and some blame its performance issues on that. It makes me wonder how much changed, company wide, for an instantaneous jump to that many draw calls to have happened.

Ubisoft took the plunge.

We might not see the true benefit of these new APIs until they grow in popularity. They have the potential to simplify driver and game development, which the PC genuinely needs. Modern GPUs operate a lot closer to their paradigm in GPU Compute APIs, with some graphics functionality added, than they did in the 1990s versions of DirectX and OpenGL. Trying to shoe-horn them into the way we used to interface with them limits them, and it limits the way we develop content for them.

This (mostly) isn't free performance, but it frees performance the more it influences development.