DX11 could rival Mantle

During GDC and after the DX12 reveal, we sat down with NVIDIA’s Tony Tamasi to talk about how DX11 can rival Mantle with efficiency improvements.

The big story at GDC last week was Microsoft’s reveal of DirectX 12 and the future of the dominant API for PC gaming.  There was plenty of build up to the announcement with Microsoft’s DirectX team posting teasers and starting up a Twitter account of the occasion. I hosted a live blog from the event which included pictures of the slides. It was our most successful of these types of events with literally thousands of people joining in the conversation. Along with the debates over the similarities of AMD’s Mantle API and the timeline for DX12 release, there are plenty of stories to be told.

After the initial session, I wanted to setup meetings with both AMD and NVIDIA to discuss what had been shown and get some feedback on the planned direction for the GPU giants’ implementations.  NVIDIA presented us with a very interesting set of data that both focused on the future with DX12, but also on the now of DirectX 11.

The reason for the topic is easy to decipher – AMD has built up the image of Mantle as the future of PC gaming and, with a full 18 months before Microsoft’s DirectX 12 being released, how developers and gamers respond will make an important impact on the market. NVIDIA doesn’t like to talk about Mantle directly, but it’s obvious that it feels the need to address the questions in a roundabout fashion. During our time with NVIDIA’s Tony Tamasi at GDC, the discussion centered as much on OpenGL and DirectX 11 as anything else.

What are APIs and why do you care?

For those that might not really understand what DirectX and OpenGL are, a bit of background first. APIs (application programming interface) are responsible for providing an abstraction layer between hardware and software applications.  An API can deliver consistent programming models (though the language can vary) and do so across various hardware vendors products and even between hardware generations.  They can provide access to feature sets of hardware that have a wide range in complexity, but allow users access to hardware without necessarily knowing great detail about it.

Over the years, APIs have developed and evolved but still retain backwards compatibility.  Companies like NVIDIA and AMD can improve DirectX implementations to increase performance or efficiency without adversely (usually at least) affecting other games or applications.  And because the games use that same API for programming, changes to how NVIDIA/AMD handle the API integration don’t require game developer intervention.

With the release of AMD Mantle, the idea of a “low level” API has been placed in the minds of gamers and developers.  The term “low level” can mean many things, but in general it is associated with an API that is more direct, has a thinner set of abstraction layers, and uses less translation from code to hardware.  The goal is to reduce the amount of overhead (performance hit) that APIs naturally impair for these translations.  With additional performance available, the CPU cycles can be used by the program (game) or be slept to improve battery life. In certain cases, GPU throughput can increase where the API overhead is impeding the video card's progress.

Passing additional control to the game developers, away from the API or GPU driver developers, gives those coders additional power and improves the ability for some vendors to differentiate. Interestingly, not all developers want this kind of control as it requires more time, more development work, and small teams that depend on that abstraction to make coding easier will only see limited performance advantages.

The reasons for this transition to a lower level API is being driven the by widening gap of performance between CPU and GPUs.  NVIDIA provided the images below.

On the left we see performance scaling in terms of GFLOPS and on the right the metric is memory bandwidth. Clearly the performance of NVIDIA's graphics chips has far outpaced (as have AMD’s) what the best Intel desktop processor have been able and that gap means that the industry needs to innovate to find ways to close it.

Even with that huge disparity, there aren't really that many cases which are ripe for performance improvement with CPU efficiency increases.  NVIDIA showed us this graphic above with performance changes when scaling a modern Intel Core i7 processor from 2.5 GHz to 3.3 GHz. 3DMark and AvP benchmarks don't scale at all, Battlefield 3 scales up to 3% with the GTX Titan, Bioshock Infinite scales across the board up to 5% and Metro: Last Light is the stand out with an odd 10%+ change on the HD 7970 GHz Edition.  

NVIDIA doesn’t deny that a lower level API is beneficial or needed for PC gaming. It does, however, think that the methodology of AMD’s Mantle is the wrong way to go.  Fragmenting the market into additional segments with a proprietary API does not maintain the benefits of hardware abstractions or “cross vendor support”. I realize that many readers will see some irony in this statement considering many in the industry would point to CUDA, PhysX, 3D Vision and others as NVIDIA’s own proprietary feature sets. 

NVIDIA’s API Strategy

Obviously NVIDIA is going to support DirectX 12 and continues to support the latest updates to OpenGL.  You’ll find DX12 support on Fermi, Kepler and Maxwell parts (in addition to whatever is coming next) and NVIDIA says they have been working with Microsoft since the beginning on the new API.  The exact timeline of that and what constitutes “working with” is also up for debate, but that is mostly irrelevant for our discussion. 

What NVIDIA did want to focus on with us was the significant improvements that have been made on the efficiency and performance of DirectX 11.  When NVIDIA is questioned as to why they didn’t create their Mantle-like API if Microsoft was dragging its feet, they point to the vast improvements possible and made with existing APIs like DX11 and OpenGL. The idea is that rather than spend resources on creating a completely new API that needs to be integrated in a totally unique engine port (see Frostbite, CryEngine, etc.) NVIDIA has instead improved the performance, scaling, and predictability of DirectX 11.

This graphic, provided by NVIDIA of course, shows 9 specific Direct3D 11 functions.  The metric of efficiency in this case is rated by the speed increase between the AMD R9 290X in red and the three different progressive driver versions in green on a GTX 780 Ti.  The Draw, SetIndexBuffer and SetVertexBuffers functions have gone through several hundred percent performance improvements since just the R331 driver stack to an as-yet-unreleased driver due out in the next couple of weeks.

These were obviously hand selected by NVIDIA so there may be others that show dramatically worse results, but it is clear that NVIDIA has been working to improve the efficiency of DX11. NVIDIA claims that these fixes are not game specific and will improve performance and efficiency for a lot of GeForce users. Even if that is the case, we will only really see these improvements surface in titles that have addressable CPU limits or very low end hardware, similar to how Mantle works today.

NVIDIA shows this another way by also including AMD Mantle.  Using the StarSwarm demo, built specifically for Mantle evaluation, NVIDIA’s GTX 780 Ti with progressive driver releases sees a significant shift in relation to AMD.  Let’s focus just on D3D11 results – the first AMD R9 290X score and then the successive NVIDIA results.  Out the gate, the GTX 780 Ti is faster than the 290X even using the R331 driver. If you move forward to the R334 and the unreleased driver you see improvements of 57% pushing NVIDIA’s card much higher than the R9 290X using DX11.

If you include Mantle in the picture, it improves performance on the R9 290X by 87% – a HUGE amount! That result was able to push the StarSwarm performance past that of the GTX 780 Ti with the R331 and R334 drivers but isn’t enough to stay in front of the upcoming release.

Thief, the latest Mantle-based game release, shows a similar story; an advantage for AMD (using driver version 14.2) over the GTX 780 Ti with R331 and R334, but NVIDIA’s card taking the lead (albeit by a small percentage) with the upcoming driver.

If you followed the panels at GDC at all, you might have seen one about OpenGL speed improvements as well.  This talk was hosted by NVIDIA, AMD and Intel and all involved openly bragging about the extension-based changes to the API that have increased efficiency in a similar way to what NVIDIA has done with DX11. Even though OpenGL often gets a bad reputation for being outdated and bulky, the changes have added support for bindless textures, texture arrays, shader storage buffer objects, and commonly discussed DirectX features like tessellation, compute shaders, etc. 

Add to that the extreme portability of OpenGL across mobile devices, Windows, Linux, Mac, and even SteamOS, and NVIDIA says their commitment to the open-source API is stronger than ever.

The Effect of DirectX 12

As we discussed in our live blog, the benefits of the upcoming DX12 implementation will come in two distinct parts: performance improvements for existing hardware and feature additions for upcoming hardware.  Microsoft isn’t talking much about the new features that it will offer and instead are focused on the efficiency improvements. These include reductions in submission overhead, improved scalability on multi-core systems, and its ability to mimic a console-style execution environment. All of this gives more power to the developer to handle and manipulate the hardware directly.

NVIDIA claims that work on DX12 with Microsoft “began more than four years ago with discussions about reducing resource overhead. For the past year, NVIDIA has been working closely with the DirectX team to deliver a working design and implementation of DX12 at GDC.” This would indicate that while general ideas about what would be in the next version of DX, the specific timeline to build and prepare it started last spring.

NVIDIA is currently the only GPU vendor to have a DX12 capable driver in the hands of developers and the demo that Microsoft showed at GDC was running on a GeForce GTX TITAN BLACK card. (UPDATE: I was told that actually Intel has a DX12 capable driver available as well leaving AMD as the only major vendor without.)

Will NVIDIA feel heat from Mantle?

Though it doesn’t want to admit it, NVIDIA is clearly feeling some pressure from gamers and media due to AMD’s homemade Mantle API.  The company’s stance is to wait for DirectX 12 to be released and level the playing field with an industry standard rather than proceed down the pathway of another custom option.  In the next 18 months, though, there will be quite a few games released with Mantle support, using the Frostbite engine or CryEngine. How well those games are built, and how much of an advantage the Mantle code path offers, will determine if gamers will respond positively to Radeon cards.  NVIDIA, on the other hand, will be focusing its reticule on improving the efficiency and performance of DirectX 11 in its own driver stack in an attempt to maximize CPU efficiency (and thus overall performance) levels to rival Mantle. 

During a handful of conversations with NVIDIA on DirectX and Mantle, there was a tone from some that leaned towards anger, but hints at annoyance. It’s possible, according to NVIDIA’s performance improvements in DX11 efficiency shown here, that AMD could have accomplished the same thing without pushing a new API ahead of DirectX 12.  Questions about the division of internal resources on the AMD software team between Mantle and DirectX development are often murmured as is the motives of the developers continuing to adopt Mantle today. Finding the answers to such questions is a fruitless endeavor though and to speculate seems useless – for now.

AMD has done good with Mantle.  Whether or not the company intended for the new API to become a new standard, or merely force Microsoft's hand with DirectX 12, it is thanks to AMD that we are even talking about efficiency with such passion. Obviously AMD hopes they can get some financial benefits from the time and money spent on the project, with improved marketshare and better mindshare with gamers on the PC. The number and quality of games that are released in 2014 (and some of 2015) will be the determining factor for that.

Over the next year and half, NVIDIA will need to prove its case that DirectX 11 can be just as efficient as what AMD has done with Mantle.  Or at the very least, the performance deltas between the two options are small enough to not base purchasing decisions on.  I do believe that upon the release of DX12, the playing field will level once again and development on Mantle will come to a close; that is, at least if Microsoft keeps its promises.