Hot Topic: Asynchronous Shaders

Manufacturer: PC Perspective

To the Max?

Much of the PC enthusiast internet, including our comments section, has been abuzz with “Asynchronous Shader” discussion. Normally, I would explain what it is and then outline the issues that surround it, but I would like to swap that order this time. Basically, the Ashes of the Singularity benchmark utilizes Asynchronous Shaders in DirectX 12, but they disable it (by Vendor ID) for NVIDIA hardware. They say that this is because, while the driver reports compatibility, “attempting to use it was an unmitigated disaster in terms of performance and conformance”.

View Full Size

AMD's Robert Hallock claims that NVIDIA GPUs, including Maxwell, cannot support the feature in hardware at all, while all AMD GCN graphics cards do. NVIDIA has yet to respond to our requests for an official statement, although we haven't poked every one of our contacts yet. We will certainly update and/or follow up if we hear from them. For now though, we have no idea whether this is a hardware or software issue. Either way, it seems more than just politics.

So what is it?

Simply put, Asynchronous Shaders allows a graphics driver to cram workloads in portions of the GPU that are idle, but not otherwise available. For instance, if a graphics task is hammering the ROPs, the driver would be able to toss an independent physics or post-processing task into the shader units alongside it. Kollock from Oxide Games used the analogy of HyperThreading, which allows two CPU threads to be executed on the same core at the same time, as long as it has the capacity for it.

Kollock also notes that compute is becoming more important in the graphics pipeline, and it is possible to completely bypass graphics altogether. The fixed-function bits may never go away, but it's possible that at least some engines will completely bypass it -- maybe even their engine, several years down the road.

I wonder who would pursue something so silly, whether for a product or even just research.

But, like always, you will not get an infinite amount of performance by reducing your waste. You are always bound by the theoretical limits of your components, and you cannot optimize past that (except for obviously changing the workload itself). The interesting part is: you can measure that. You can absolutely observe how long a GPU is idle, and represent it as a percentage of a time-span (typically a frame).

And, of course, game developers profile GPUs from time to time...

According to Kollock, he has heard of some console developers getting up to 30% increases in performance using Asynchronous Shaders. Again, this is on console hardware and so this amount may increase or decrease on the PC. In an informal chat with a developer at Epic Games, so massive grain of salt is required, his late night ballpark “totally speculative” guesstimate is that, on the Xbox One, the GPU could theoretically accept a maximum ~10-25% more work in Unreal Engine 4, depending on the scene. He also said that memory bandwidth gets in the way, which Asynchronous Shaders would be fighting against. It is something that they are interested in and investigating, though.

View Full Size

This is where I speculate on drivers. When Mantle was announced, I looked at its features and said “wow, this is everything that a high-end game developer wants, and a graphics developer absolutely does not”. From the OpenCL-like multiple GPU model taking much of the QA out of SLI and CrossFire, to the memory and resource binding management, this should make graphics drivers so much easier.

It might not be free, though. Graphics drivers might still have a bunch of games to play to make sure that work is stuffed through the GPU as tightly packed as possible. We might continue to see “Game Ready” drivers in the coming years, even though much of that burden has been shifted to the game developers. On the other hand, maybe these APIs will level the whole playing field and let all players focus on chip design and efficient injestion of shader code. As always, painfully always, time will tell.

Video News

September 2, 2015 | 08:13 AM - Posted by Anonymous (not verified)

Maybe AMD should focus their time on driver support for games instead of some benchmark they are invested in for a game hardly anyone will play. With the amount of time AMD and Oxide are investing into this how can one not see their partnership in all of this?

They after all worked together heavily with Mantle in the other benchmark for Star Swarm. I guess money is good in benchmarks today? Maybe they think this all will help sell AMD GPUs? Smart gamers should reserve opinion until actual games start coming out.

September 2, 2015 | 08:46 AM - Posted by allons (not verified)

Are you trying to say AMD shouldn't go out of their way to make nVidia look bad and causing geforce owners to feel insecure about their purchase? There, there. There're still many GameWorks™ titles coming soon so no more tears.

September 2, 2015 | 11:20 AM - Posted by Anonymous (not verified)

Why just the small gaming market for AMD, Nvidia is selling it's GPUs to the HPC/server and supercomputer folks and they buy loads of GPUs. Now that AMD has a proven advantage with their HSA and asynchronous compute efforts that improve both compute and gaming, it's AMDs time to get some of that lucrative action. AMD's/Others investments in HSA and GPGPU is beginning to pay off on the PC/laptop market like it has been paying off for the mobile market, and you want AMD to strip down their GPUs all around functionality in the name of only gaming. There are people that need that computational power of AMD's ACE units for other graphics uses besides gaming, and other computational uses besides gaming uses!

AMD's continued improvements in HSA computing is going to transform all those thousands of GPU cores with functionality more like CPU cores for any types of calculations, and Intel sure is not adding any more cores to its CPU SKUs(costly $$$$)! GPUs are going to become even more computationally focused with AMD Arctic Islands, and even more gaming logic will be able to be run on the ACE units, and if more gaming, as well as graphics, code is running directly on the GPUs cores, well that just about does away with most of the latency issues. AMDs APUs for the HPC/Server/workstation/Supercomputer market will be made/derived into consumer versions, and the more revenues from these types of markets, as well as the exascale research grants from U-SAM will fund the R&D that brings gaming to a much higher level. Watch out for those APUs on an interposer! They will be beasts with loads of ACE units and 8 or more SMT CPU cores!

We all Know where Nvidia got the power savings from, power savings at the expense of asynchronous compute and FP performance just look at the bit-coin mining folks before the specialized bit-coin ASICs were developed they chose AMD. Now the gaming engines software stack and the major graphics APIs are making use of this asynchronous compute ability on AMD's GPU HARDWARE there is no going back to only depending on the CPU's limited cores for gaming or any other types of computing! CPUs can not even game without the help of GPUs!

P.S. Imigination Technology is doing the very same thing with their PowreVR GPUs and HSA/compute, as are the other mambers of the HSA foundation! And the Mobile market is ahead of the PC/laptop market in making use of the GPU for any and all types of calculations, including gaming calculations!

September 2, 2015 | 05:48 PM - Posted by ppi (not verified)

Couple statements of their PR guy is free (the guy is fixed cost), paying couple devs to send to help developer with a game costs mucho $$$.

Supporting one game is less expensive than supporting 10 of them.

September 2, 2015 | 11:28 PM - Posted by Mark_GB


AMD goes out of its way to make sure Nvidia hardware can and does run AMD inventions as well as it runs on AMD hardware.

Nvidia on the other hand only optimizes their inventions for the current release of their own hardware. I have a GTX 770 in my system. I bought this card 15 months ago. Hairworks by Nvidia sucks bad on this card. But it works on 900 series cards. It even sucks on Titans. And Nvidia forbid all companies using Hairworks from optimizing any part of Hairworks for any other companies hardware.

So AMD plays nice.
Nvidia will even screw over the customers of its previous generation hardware. That is not what is called playing nice.

So I hope that next year, when all the 16nm GPU's come out, that AMD blows Nvidia off of the map. I for one, will laugh when Nvidia gets paid back for being dweebs and trying to fracture the video card market.

September 4, 2015 | 01:50 AM - Posted by Anonymous (not verified)

YYYYeah I have a 980, hairworks sucks in Witcher 3 on a 980. There's no way I'm willing to go from 60 fps to 45 for some dog fur. It might run worse on older GPUs, as is always the case with new features, but don't pretend it's magically free on newer hardware and evil guy Nvidia disabled it just to be dicks.

September 7, 2015 | 02:05 AM - Posted by Anonymous (not verified)

You mean everything isn't an internet conspiracy?

September 2, 2015 | 08:39 AM - Posted by Anonymous (not verified)

It's amusing how AMD/Oxide have taken this into a joined PR battle against NVidia. I mean, sure, one can say NVidia started it when they came out and said "yeah, we don't really think that particular game is a good example of DX12"

But since then, AMD/Oxide have constantly derailed this topic. Yes, if AMD only invested this time into current support of games.

AMD/Oxide made early implied statements that NVidia shouldn't really be saying they are fully DX12 due to all of this. But then once they were questioned if AMD supported all DX12 features after much questioning they both came out and said no, even AMD does not support all DX12 level features. But AMD was quick to say, but we can do the same thing other ways. Yeah, so can NVidia. And Oxide also came out and said that "maybe" they read the DX12 spec wrong.

AMD wants you to think, and indeed they have said many times that this just shows that with GCN they knew for a while this is where it was going and it was all part of the plan. Maybe, maybe not. So then where you for the last several years when you have constantly been playing catch up with NVidia on DX9/10/11?

And the argument that NVidia had access to this game for a while doesn't change the fact that it is very clear AMD/Oxide are close partners on this. And with the amount of joined jabs at NVidia and jerking off of AMD doesn't really support the developers case that they are unbiased.

Wait for more games with engines that more games are going to use, like Unreal.

September 2, 2015 | 09:03 AM - Posted by allons (not verified)

Don't get caught up with a single game. Most real games use Unreal engine which are often enhanced by GameWorks™. nVidia is right about Ashes of the Singularity not being a representation of DirectX 12. AMD's ACE will not matter when the real DX12 games releases.

September 2, 2015 | 10:13 PM - Posted by Theo (not verified)

There is no question at this point that Oxide is AMD's sock puppet. The question is whether they're actually right about async shaders and Maxwell 2. I'm awaiting Nvidia's official response to this whole brouhaha.

September 2, 2015 | 11:39 PM - Posted by Mark_GB

I don't know who you have been listening to, but AMD has said many times that they support DX12 with the 11_3 feature set.

It was Nvidia that was saying that they had full DX12 support. And now we find out that while their driver said they had Asynchronous Shaders, but now it comes out, that in fact, they can not do Asynchronous Shaders, and the game company had to literally write code to implement that on all Nvidia cards. Why would a company that is normally so precise about things turn on a feature in their drivers that they know they cannot support? And why did they get so angry when this was mentioned? Because they were lying! And once again trying to pull a fast one on the consumers.

They even tried to bully Oxide Games into not using Asynchronous Shaders on AMD hardware. And when Oxide Games refused to be bullied, then Nvidia attacked their credibility too.

Nice people at Nvidia eh? Lie to your customers. Bully the game makers. Hide the truth. And all the while try to make AMD look bad because AMD hardware has been capable of doing Asynchronous Shaders for years now, while no Nvidia card can. Will the new cards in 2016 support it? Well, we already know AMD cards will. Nvidia is not saying, and since it would make absolutely no sense to deny that you had something coming out that your competitors have had for years, it is beginning to look like the new Nvidia cards will not be able to do Asynchronous Shaders either.

So maybe that is why Nvidia is trying so hard to hide the fact that Asynchronous Shaders can and are useful in games.

September 3, 2015 | 03:49 AM - Posted by renz (not verified)

Can you give the proof that nvidia asking Oxide to disable Async for the game entirely? Because from the article that I read from guru3d nvidia ask to disable Async for their hardware only. Not disable Async from the game entirely so AMD cannot benefit from Async at all. Also many people only look at Maxwell and GCN only. In simple test by WCCFTECH GTX770 able to increase FPS from 19 to 55 in DX12.

September 3, 2015 | 03:55 AM - Posted by renz (not verified)

Can you give the proof that nvidia asking Oxide to disable Async for the game entirely? Because from the article that I read from guru3d nvidia ask to disable Async for their hardware only. Not disable Async from the game entirely so AMD cannot benefit from Async at all. Also many people only look at Maxwell and GCN only. In simple test by WCCFTECH GTX770 able to increase FPS from 19 to 55 in DX12.

September 4, 2015 | 01:55 AM - Posted by Anonymous (not verified)

No. Nvidia asked for Async shaders to be disabled on their hardware because it was severely tanking performance, not AMD's. I've seen so many AMD fans misrepresent this it's as though you're seeing what you want to see.

September 9, 2015 | 10:42 AM - Posted by Anonymous (not verified)

It occurs to me that this passage from this post ( suggests that Nvidia WAS demanding that Async shaders be disabled entirely:

"P.S. There is no war of words between us and Nvidia. Nvidia made some incorrect statements, and at this point they will not dispute our position if you ask their PR. That is, they are not disputing anything in our blog. I believe the initial confusion was because Nvidia PR was putting pressure on us to disable certain settings in the benchmark, when we refused, I think they took it a little too personally."

That passage suggests that Nvidia was trying to get them to disable Async entirely. If they'd been asking them to disable it ONLY on Nvidia hardware, Oxide would not have refused - they ultimately did so. What this suggests is that Nvidia demanded it be disabled entirely, Oxide refused, and instead disabled it only for hardware with an Nvidia vendor ID.

September 2, 2015 | 08:41 AM - Posted by Keven Harvey (not verified)

Nvidia, you just can't stop lying, you have a problem. It's not that big of a deal if your gpu doesn't support it, it will show in benchmarks, so STOP LYING about it.

September 4, 2015 | 01:56 AM - Posted by Anonymous (not verified)

I'm sure your comment will convince them to come clean. Well done.

September 2, 2015 | 08:44 AM - Posted by RushTheBus (not verified)

I'm glad to see that you guys threw up an article about it and hopefully it will be discussed in tonight's podcast. It seems like there is an awful lot of conjecture going on in the discussion more globally with very little in the way of hard facts (though the guys on the beyond3d forums are trying to figure it out).

Like i said, i really look forward to a possible discussion this evening!

September 2, 2015 | 09:20 AM - Posted by Edkiefer (not verified)

I have no idea on support for AS but who says you have to use every feature available in DX12 . Since Dx12 now were talking low level , many features might not even work depending on card .
Performance should be priority , also If you have X GPU and in the game the GPU is utilized 95-99% most of time w/o AS .
How can AS help at all if your able to feed GPU w/o it waiting for something .

September 2, 2015 | 10:31 AM - Posted by Scott Michaud

It's not really whether the GPU is loaded 95-99% of the time; it is whether 100% of the GPU is loaded 95-99% of the time. If there is no gaps in time and the task is using every part of the GPU, then yes, you will not get any gains. That said, console developers are claiming to see double-digit gains.

September 2, 2015 | 11:47 PM - Posted by Mark_GB

DX12 was designed from the ground up to support many existing video cards at some level. That was a first. All previous DirectX versions needed to have all new features, so everyone had to wait awhile for cards to come out that had those new features.

So while no card available today supports every feature of DX12, most newer cards support some things in DX12 already. There will most likely be some cards coming out next year that will offer full DX12 feature support.

Based on how AMD and Nvidia seem to be acting, AMD will probably be the company that has cards that support every last DX12 feature, and Nvidia will be the ones lying about what they have until someone proves that they are lying, and then they will try to patch it into their driver or some crap, while bashing whoever it is that exposes the lies, and AMD as well...

September 4, 2015 | 02:01 AM - Posted by Anonymous (not verified)

Based on pure conjecture, I would say that my penis is rather large.

It's irrelevant who lies about what, what's relevant is game performance. When games hit retail, there's no lying or fanboying past the numbers, so let's all have some cake and wait for those why don't we.

September 2, 2015 | 09:22 AM - Posted by JohnGR

Thanks for the article.

September 2, 2015 | 09:48 AM - Posted by Christo (not verified)

This should be interesting. Especially I would like to know how this will affect VR latency.

First the 3.5GB fiasco and now this...I hope this is not Nvidia caught up in another half truth!

Fact is that if Nvidia hardware can't render VR games with low latency as a direct result of this, then many people including myself will be forced to buy AMD hardware.

September 2, 2015 | 09:50 AM - Posted by Christo (not verified)

Interesting information on Reddit about the issue:

September 2, 2015 | 09:57 AM - Posted by Anonymous (not verified)

" Especially I would like to know how this will affect VR latency. "

Minimally. Apart from sharing the word "asynchronous", the main gator of Asynchronous Timewarp latency is Late Latching accuracy, not shader dispatch efficiency.

September 2, 2015 | 10:11 AM - Posted by Scott Michaud

I guess it could... but not necessarily. You're basically cramming independent work in the free times, and the free areas, of a GPU. Often, this will actually mean doing work on two frames at once, as explained in ~20:50 on the Unity video in the editorial. In other words, you're introducing a whole frame-length of latency, which is bad for VR.

That said, it all depends on what you're cramming in, where. You could probably do something like draw or shadow all the static geometry while the GPU processes compute-based lighting effects for the movable and deformable geometry asynchronously, which would be independent. As with a lot of this stuff, it's up to the game developer.

September 4, 2015 | 05:54 PM - Posted by Ty (not verified)

Scott Wasson and David Kanter touched on aots/async briefly here and mentioned vr expectations on nvidia relative to amd/intel

One thing I did not know listening to this and reading other threads is that one of the reasons that maxwell is so much more energy efficient than gcn is that they stripped out most of the hardware schedulers that consume a lot of power. They have software to take care of some of those tasks, but it seems one of the consequences of that is that while that works well enough for more serialized dx11 workloads, it is probably a penalty for asynch compute workloads.

That previous universal positive was never really framed as a game of tradeoffs for the future, and it would have been nice if people in the know had discussed some of these implications in more detail earlier.

September 6, 2015 | 02:18 PM - Posted by Anonymous (not verified)

Wasson was a little too enthusiastic to the point of gushing on Intel's not so great improvements, with Skylake only offering small gains relative to those made by SandyBridge. That's one hell of a lot of different SKUs for even each usage level(mobile to desktop) with plenty of careful product segmentation. Intel is continuing to segement its SKUs and charge extra for the features to be added back among its initial 46 Skylake SKUs.

AMD will definitely be leveraging its ACEs and asynchronous compute with its future Zen based APUs, so having the GPU's cores able to assist the CPU's cores for compute and graphics workloads in gaming and other workloads will definitely allow AMD to compete.

September 2, 2015 | 09:55 AM - Posted by Anonymous (not verified)

There's an interesting discussion thread on Beyond3D with a homebrewed benchmark testing comparative gaming+compute shader latency on GCN and Maxwell. Results are inconclusive; lots of arguing over if it's testing the right thing, and how to interpret the results.

September 2, 2015 | 10:20 AM - Posted by Anonymous (not verified)

Isnt this the exact reason why AMD cards were often 2x the speed of Nvidia for bit coin mining?

September 2, 2015 | 10:21 AM - Posted by Anonymous (not verified)

Is so that would mean Nvidia absolutely cant do ASYNC

September 2, 2015 | 11:44 AM - Posted by Master Chen (not verified)

Hmm...does mining heavily utilize GCN's ACEs, though?
I can't remember (that was quite long ago). And, if my memory is right, HD 6xxx (and even 5xxx) series mined coins just as well as HD 7xxx series (which were beast of butt-coin mining back in the day) did, and 6xxx didn't have GCN on their board. Anyway, if that's what really was going on back then, then...holy shit,

September 2, 2015 | 12:48 PM - Posted by Anonymous (not verified)

Mining has nothing to do with async shaders. AMD had elected to incorporate the (BIT_ALIGN_INT) command in hardware, whereas nvidia did not and has to do 3 separate commands to achieve the same result (2 shifts + 1 add).

On top of that AMD uses a massive amounts of "simple" Stream units which can all be run in parallel. nvidia uses a LOT less "complex" units, and therefore just cannot do the same massive amounts of simple parallel operations AMD GPUs can.

AMD architecture is just better adapted to parallel workloads.

GCN takes it one step further with Async Compute which can execute out of order operations, which apparently nvidia has to "fake" in software at a HORRENDOUS lag time cost.

September 2, 2015 | 10:37 AM - Posted by Airbrushkid (not verified)

I am a Nvidia fan and will always be. Never had a problem from back when I got my first video card Geforce 3. I tried ATI way back and they where crap. No good drivers, and even now there drivers are still not that great.

One alpha dx 12 game and all the AMD fans jump. But just wait. When the real dx 12 games come out they will be crying like always.

Just like how both sides are crying about driver support on Windows 10 sucks! Everybody always jumps to fast then cry's.

September 2, 2015 | 10:42 AM - Posted by Anonymous (not verified)

Thanks for sharing. Your post was very original and informative. I can almost hear you breaking your own toys from here in a fit of rage.

September 2, 2015 | 10:57 AM - Posted by Airbrushkid (not verified)

I don't get in a rage like AMD fans boys do. I laugh. Remember my drivers work. :-)

September 2, 2015 | 01:02 PM - Posted by JohnGR

Ah I see you installed your daily Nvidia Hotfix driver.

As for the games, don't worry. Nvidia's money will guaranty that GameWorks will do more damage to AMD and Kepler cards than Async shaders will do to Maxwell. Probably the developers will be $$$persuaded$$$ to not use async shaders at all anyway. So you will never realize what you lost. Ignorance is bliss. Exactly what happened with DirectX 10.1.

Then Pascal will come and every developer in the planet will start using Async shaders like there is no tomorrow. AMD owners will see their cards performing good in those games that use Async shaders, Maxwell owners will see that it is time to pay Nvidia again for a new graphics card.

September 2, 2015 | 06:00 PM - Posted by annoyingmoose (not verified)

first and foremost developers will be "$$$persuaded$$$" by market share (82% Nvidia).

and yes they will spoil us with ray traced shadows utilizing Conservative Raster which AMD lacks, etc. Async Shaders is just one of many DX12 advancements, and when/if it becomes popular on PC you probably will be out for a new card anyway.

knowing this you still buy AMD then rage in comments talking about ignorance.

September 4, 2015 | 04:57 PM - Posted by Anonymous (not verified)

Nvidia doesn't have 82% market share, oh ignorant one. AMD has landed the console deals meaning all games developed will take into account how they can squeeze more out of GCN architecture for console market is 100% AMD and just as big as PC market if not bigger on AAA games, so if left to market share concerns it's almost guaranteed that we'll have games taking advantage of async shaders years before ray traced shadows even become a thing, if it ever does.

September 5, 2015 | 08:35 PM - Posted by annoyingmoose (not verified)

82% discrete graphics, oh console peasant.
oxide just confirmed nvidia will fully support async shaders in upcoming driver, sorry to burst your bubble.

September 7, 2015 | 02:20 AM - Posted by Anonymous (not verified)

Software support is not hardware support.
It's vastly inferior.

September 2, 2015 | 10:44 AM - Posted by Anonymous (not verified)

Linux is still waiting for many years for good AMD drivers and AMD has had access to Linux the whole time!

September 2, 2015 | 11:37 AM - Posted by Anonymous (not verified)

Valve, and Vulkan will help greatly in getting AMDs software/driver stack ready for Linux based gaming. And AMD has been working with the Blender foundation in getting their drivers working with Blender's Cycles rendering. The entire graphics API and software stack of the gaming industry is going through a complete overhaul thanks to Mantle, and there is a lot of Mantle coming up for the ground in Vulkan(The public release of Mantle for all points and purposes, as well as most of DX12). AMD needs to do more but driver development takes time, and AMD's drivers do improve over time.

September 4, 2015 | 05:02 PM - Posted by Anonymous (not verified)

Big difference: AMD isn't claiming a conspiracy by Linux kernel developers to make AMD drivers slower or buggy with Linux, quite unlike Nvidia's reaction to Oxide and their benchmarks.

September 2, 2015 | 11:00 AM - Posted by fkr

stay in school. your ability to write needs a little practice. Not to mention nobody cares what a teenage fanboy thinks.

September 2, 2015 | 11:54 AM - Posted by Master Chen (not verified)

So much butthurt. Juicy.

I've started with VooDoo and GeForce too, and haven't been buying Radeons up until HD 6850 came out, but you make me absolutely cringe, mister so-called "Nvidia fan" kiddo. My last GeForce was the GODLIKE GTX 285 (very first Twin Frozr), I've absolutely loved that card, so you don't have any right to call me a "Radeon fanboy" or anything like that, because I was using both for many-many years, but after GTX 285 Nvidia started falling out of grace MASSIVELY, and it STILL hasn't crawled out of that deep hole filled with donkey shit. Since GTX 285, I've been buying only Radeons, but that doesn't make me a "Radeon fanboy", because I still hope that one day Nvidia actually pulls head out of it's ass and actually releases a product good enough for me to consider it as being worthwhile enough of buying. 4xx series, 5xx series, 6xx series, 7xx series, 9xx - all of this is unworthy garbage for me, haven't gotten even remotely close to the GODLIKE GTX 285 in my opinion. My only hope is for HBM Pascal now. Until then - I'll stay with Radeon.

September 7, 2015 | 02:22 AM - Posted by Anonymous (not verified)

"Since GTX 285, I've been buying only Radeons, but that doesn't make me a Radeon fanboy"

It does.

September 8, 2015 | 03:58 PM - Posted by Master Chen (not verified)

Try harder, kiddo.

September 3, 2015 | 02:47 AM - Posted by Jules (not verified)

I got feeling when REAL DX12 games comes out you'll be hiding in a lonely corner.

Oxide said their ASYNC test is small stuff compare to what console developers are doing.

September 2, 2015 | 10:43 AM - Posted by SensibleOpinion

I think the main issue (which is, strangely, forgotten) is the fact that nVidia told Oxide to alter their benchmark routine because it wouldn't suit them. And it's not coming from me, my neighbor or idk, Maddona. It's coming straight from the developer.

Oxide Developer: “NVIDIA Was Putting Pressure On Us To Disable Certain Settings In The Benchmark” ( )

I think THIS is the biggest news in this whole story. What versions of DX12 is hardware supported or not, I think it's irrelevant. We know that no company supports all DX12 features.

September 2, 2015 | 10:46 AM - Posted by Anonymous (not verified)

You think AMD has never worked with a developer to make sure something is optimized for them? AMD and Nvidia have both been doing this for a long time with developers.

The fact is AMD/Oxide are partners and have been since Star Swarm. They have to be taken with a grain of salt, just like nVidia would with a game developer using Gameworks.

September 2, 2015 | 11:47 AM - Posted by SensibleOpinion

IF you read the article, you would understand what I'm talking about. It's another thing optimizing and another thing forcing to make dramatic changes in the way (specifically a benchmark) works.

And no, I don't think that AMD has ever pressed a developer to disable something just because their hardware performs worse than the competition. Look at all the "gaming evolved" titles and you'll see that AMD is not always the best performer.

The fact that AMD and Oxide are partners doesn't mean that AMD has asked Oxide to make their software perform worse on nVidia's hardware. If you're suggesting that, then all TWIMTBP titles out there + Unreal Engine games are biased by nVidia.

September 2, 2015 | 12:59 PM - Posted by Anonymous (not verified)

AMD has a *marketing* agreement with Stardock, the publisher.

Not with Oxide, the developer.

September 2, 2015 | 01:06 PM - Posted by JohnGR

Optimizing for your hardware is GOOD.
Sabotaging performance in everything but your last series of cards is BAD.
Know the difference.

September 7, 2015 | 02:25 AM - Posted by Anonymous (not verified)

Sometimes, both AMD/NV "optimise" by sabotaging on the other side.

Know your history.

September 2, 2015 | 10:52 AM - Posted by Scott Michaud

I think this is industry practice, but I'm not sure. I know Futuremark accepts feedback from all hardware vendors to shape their test suite. Ryan should know more about this, though.

September 2, 2015 | 12:48 PM - Posted by SensibleOpinion

It's another thing giving feedback and another thing pressing a developer to disable features because it doesn't suit your results. :)

September 2, 2015 | 12:49 PM - Posted by Anonymous (not verified)

AMD has a patent application for an APU on an interposer with a FPGA added to the HBM memory stack between the HBM's bottom die logic chip, and the memory stacks above. So maybe a future AMD gaming APU will have the ability to program in any new Vulkan/DX** feature sets into the FPGA, before the features are included in the hardware of future GPUs. The HPC/Workstation APU with FPGA/s, HBM on an Interposer will offer many advantages in the future, and there will definitely be derived GAMING APUs with HBM that will have a little FPGA ABILITY added to the stack/s, so any new features can be added that are not yet available in an ASIC form. Just imagine being able to add the latest decoding codexes or compression algorithms to the FPGAs also!

September 2, 2015 | 04:44 PM - Posted by Anonymous (not verified)

I doubt that is the purpose of the FPGAs. I was wondering if it may be used to route around defective connections. It is a very wide interface (1024-bit per stack) and adding a second or third connection for redundancy is expensive. Also, the process for detecting defective connections and fusing them off may be expensive and messy. You probably have a very limited number of alternate connections. With an FPGA, you could probably get away with a much smaller number of active connections since you would have a lot of flexibility in routing around defects. You also could route around bad connections by just programming the FPGA, or even have the FPGA detect bad connections and re-route as needed.

September 3, 2015 | 04:21 PM - Posted by Anonymous (not verified)

The FPGA is not the Bottom memory control logic chip, The FGPA on the HBM stack is there for compute and not there to serve as a memory controller/router, that is still the job of the bottom logic/controller chip on the HBM stack! This is listed in the patent application, and is part of AMD's Workstation/HPC/exascale research that mentions placing some compute resources in with the memory. It is quite possible that some of AMD's HPC/Workstation IP will find its way into AMD's consumer gaming, and Laptop/PC systems.

AMD's server/HPC SKUs will be utilizing the FPGAs for specialized analytical workloads, and there is no reason why one of AMD's future high end gaming APUs could not make use of the same technology integrated into the HBM stacks for video acceleration, or even adding graphics API feature sets into the FPGA's logic. With most graphics APIs now receiving much more attention, expect there to be rapid improvements in Vulkan, DX**,and the other graphics APIs, and having an FPGA to keep things more current, or to enable last minute feature inclusions to gaming APUs/GPUs will make including some FPGA ability on the HBM stack a no brainier on future gaming systems.

Complete gaming systems on an interposer are going to revolutionize the gaming market.

September 5, 2015 | 08:09 PM - Posted by Anonymous (not verified)

It is an interesting idea, but expecting FPGAs to make it into a consumer level GPU is a bit premature.

September 6, 2015 | 01:59 PM - Posted by Anonymous (not verified)

NOT really, as so much of the technology that makes it into the Consumer product lines of the APU/CPU/GPU makers comes from the HPC/workstation markets! Those FPGA/s will come in handy for any last minute Graphics API feature inclusions on the FPGA's programmable logic! That and any new video/sound codex updates that improve performance.

So the HBM stack gets a FPGA die added and the ability to do more pre/post processing on data stored in the HBM's memory stacks. The FPGA resources available for the GPU, and CPU to offload work to could bring gaming up to a whole new level, and not only gaming but any other compute uses also!

September 2, 2015 | 11:30 AM - Posted by Anonymous (not verified)

If Nvidia did support ASYNC we all know their PR department would be in full force. The big news is they are completely silent on the matter. That is very telling.

September 2, 2015 | 01:04 PM - Posted by Anonymous (not verified)

Don't forget trying to publicly shame Oxide by trying to intimidate them.

September 2, 2015 | 06:58 PM - Posted by razor512

They are likely busy planning for damage control, as they know that DX12 games will likely heavily use a function that their cards don't truly support.

September 2, 2015 | 11:33 AM - Posted by Master Chen (not verified)

I think it's simply because Maxwell doesn't have ACEs. Kepler had some, Maxwell was completely stripped off of them. GCN, on the other hand, had ACEs from the very beginning. Thats why. Correct me on this someone if I'm wrong, because I just try to assume this by my memory, and my memory on this might be wrong.

September 2, 2015 | 12:39 PM - Posted by Anonymous (not verified)

To make the Maxwell architecture competitive and more energy efficient, nvidia had to effectively remove Dual Precision and ACE units to dedicate everything to single precision.

Maxwell can do one thing very well, but it is just a one trick pony.

September 2, 2015 | 01:56 PM - Posted by Dr_Orgo

Nvidia and AMD both have a limited transistor budget for their GPUs. Maxwell + Nvidia's DX11 drivers use that silicon budget more efficiently for DX11 games. GCN + AMD's DX11 drivers have "wasted" part of their silicon budget as games can't fully utilize some features. DX12 with an optimally designed graphics engine unlocks the previously "wasted" parts of GCN. Based on the AotS benchmarks both Maxwell and GCN are about equal in DX12.

It does seem like AMD bet the farm on DX12, although it can be viewed as neglecting DX11. If all this means that Nvidia and AMD will be equal in performance in DX12 games than that's a win for consumers.

September 2, 2015 | 04:00 PM - Posted by Anonymous (not verified)

I'm More interested in Vulkan, and getting those results on Steam OS! Vulkan has the very same features that DX12 got from Mantle, and gaming will be even better without all that windows 10 telemetry going on in the background, stealing more cycles and forcing more latency on those that have downloaded and installed SPYWARE 10.

And AMD is not limited to a single GPU die, once interposers are made to support 2, or more, GPU dies on an interposer. there will be interesting gaming systems/APUs on an interposer from AMD, derived from those workstation/HPC/supercomputer APUs that are in development! I fully expect to see AMD producing 8 Zen core/powerful Greenland graphics gaming APU that will have much more HBM, and possibly FPGA assist for gaming and other workloads in the future!

Those interposers can be etched with 10's of thousands of traces to connect multiple GPUs with plenty of CPU cores and HBM, so look forward to even more GPU power and dual or more GPU dies all sharing the interopser and plenty of next generation ACEs doing even more of their own processing for graphics, and physics for gaming and other workloads.

September 2, 2015 | 05:25 PM - Posted by Anonymous (not verified)

There are still limitations on the interposer size. I believe the current implementation is around 800 square mm. Going larger than the reticule is probably significantly more expensive to process the wafers. If you have a giant 600 square mm gpu, plus 4 or more HBM stacks, then you don't really have space for a second GPU. This isn't necessarily bad though. If yields at 14 nm FinFET are not that good, then it may be best to make much smaller, modular gpu die and use several of them on and interposer. Yields of four 150 square mm die would be significantly better than one 600 square mm die.

This may allow some other interesting things if the design budget allows. If they were making a modular gpu then you may be able to do something like design a separate die with differing amounts of single precision units and double precision units. For a consumer product, you could use something like 3 mostly SP die and one die with more DP hardware. This would allow you to make an HPC product by just using the DP heavy die for all of them. This would allow them to cover a very wide product range, from very small devices up to giant HPC designs with a small number of die.

It may also be interesting to make a separate cache die also. HBM is fast, but it is still DRAM which does not come anywhere close to SRAM cache latency. Caches take up a lot of die area and are sensitive to defects. Making a special cache die could increase yields significantly and allow much larger caches. There are a lot of other interesting things you could do, like pull the media processor (hardware video encode/decode and such) or really any GPU units off onto a separate die. This would allow you to update some components without needing to re-spin the rest of the die.

The silicon interposer tech is really revolutionary in that it will allow you to split functionality off onto separate die while still maintaining on-die like communication speeds. IMO, it is the most interesting tech to come out recently.

September 2, 2015 | 05:57 PM - Posted by ppi (not verified)

The sad part (from AMD point of view) is that it is irrelevant. Maxwell was allowed to rule for a full year, and it will take at least another year before there is critical mass of good DX12 games, where asynchronous shaders may matter.

But by then, nVidia can cover up with Pascal (though if Pascal does not support this stuff, it is probably too late), and all the GCN forward-thinking will be wasted.

Heck, looking at AotS benchmark, it may well end up that for new games nVidia cards will use DX11 codepath, while AMD will be on DX12.

That being said, I almost bought GTX960, but AotS plus hardocp Wichtcher 3 evaluation made me break and wait for more info, as Radeon 380 may be more future-proof.

September 3, 2015 | 04:43 PM - Posted by Anonymous (not verified)

Nobody is going to want a GPU without ACE type units and functionality, as the gaming engine makers will be using all of the asynchronous compute ability of AMDs GPUs for running even more of that gaming logic to go along with the ACE units use for graphics. Expect that AMDs use of GPUs for HSA style computing of all kinds on computing systems to be greatly improved upon, and the CPU to become less and less a necessary part of the Gaming or Compute equation going forward.

Even Qualcomm's latest is making use of heterogeneous compute on its SOC's GPUs and DSPs for its mobile products. The era of HSA compute is coming to all devices and AMD's Arctic Islands will have even more compete ability. Nvidia better get with the program or get left behind! People use GPUs for more than just gaming, and AMD's GPUs have always been better with compute, and now even the gaming engine makers are tapping into heterogeneous compute.

That asynchronous compute ability is here to Stay!

September 3, 2015 | 04:47 PM - Posted by Anonymous (not verified)

Edit: compete ability
To: compute ability

Although it will be hard to compete without compute(ACE) ability on the GPU going forward!

September 7, 2015 | 02:37 AM - Posted by Anonymous (not verified)

To make the Maxwell architecture competitive and more energy efficient, nvidia had to effectively remove Dual Precision and ACE units to dedicate everything to single precision.

Maxwell can do one thing very well, but it is just a one trick pony.

Yeah, it's called winning. AMD put in support for Async Compute back with 7970. What did that give them for 4 years? Nothing.

When the 900-series came out, people looked at benchmarks. Which was faster? The 900-series was faster, including OC.

It was then, it is now. And it will be in 2016. DX12 is much more than just Async Compute.

You have to win in the here and now.
Also, AMD put in Async Compute not because of some grand plan. They put it in because it was needed for consoles. That's it.

If they didn't have the console wins, it is highly doubtful it would have been in as early as it was.

Your comment reeks of bitterness. You gotta win in benchmarks relevant to people in the here and now. Not hypotheticals which will play out 3-4 years in the future. That's why AMD has lost market share these past 3 years.

September 2, 2015 | 03:55 PM - Posted by funandjam

Your memory isn't wrong. ACE's really have been in the beginning of GCN, and the reason is because of Consoles.

Asynchronous Compute(AC)made a difference with console hardware, which as well all know is essentially just a desktop APU. The problem was how does AMD get that extra performance on the desktop side? MS not only owned xbox, but also the dominant DirectX API, they had no incentive to speed up development on directx for desktop since it would compete against their cash-cow console. Enter in Mantle at about the same time consoles came out, and here we are with dx12.

September 2, 2015 | 06:52 PM - Posted by Master Chen (not verified)

So...basically...a Trojan horse? Wow,

September 2, 2015 | 11:53 AM - Posted by Anonymous (not verified)

Hay Scott the generic computing science term for Hyper-threading(TM) is Simultaneous multithreading(SMT). And Intel did not invent SMT(1).

(1) Wikipedia:

“While multithreading CPUs have been around since the 1950s, simultaneous multithreading was first researched by IBM in 1968 as part of the ACS-360 project.[1] The first major commercial microprocessor developed with SMT was the Alpha 21464 (EV8). This microprocessor was developed by DEC in coordination with Dean Tullsen of the University of California, San Diego, and Susan Eggers and Henry Levy of the University of Washington. The microprocessor was never released, since the Alpha line of microprocessors was discontinued shortly before HP acquired Compaq which had in turn acquired DEC. Dean Tullsen's work was also used to develop the Hyper-threading (Hyper-threading technology or HTT) versions of the Intel Pentium 4 microprocessors, such as the "Northwood" and "Prescott".”

Let's use computing science terminology not marketing trade names for technology definitions!

September 2, 2015 | 04:40 PM - Posted by Scott Michaud

Fair point, but Kollock, the developer I was referring to, used the term "Hyper Threading" in their post. "It allows jobs to be cycled into the GPU during dormant phases. In can vaguely be thought of as the GPU equivalent of hyper threading."

September 3, 2015 | 05:00 PM - Posted by Anonymous (not verified)

Arctic islands will see even more ACE/other functionality added, AMD will be getting back into the Server/HPC market with Zen and Greenland graphics, and the HSA style systems will continue to evolve from mobile to the exascale systems. The GPUs, DSPs, and any other types of compute will be used to their fullest going forward!

SMT is the term used in the textbooks, and Hyper-Threading(TM) is just a marketing term for SMT. Expect that maybe even AMD's K12 may have a custom ARMv8a ISA running micro-architecture that supports SMT.

September 2, 2015 | 09:09 PM - Posted by Anonymous (not verified)

yo pc master race... the bigger story is that another console port released (Mad Max) and it actually wasn't broken... Rejoice

ok back to your geek chat about what is better AMD or Nvidia for console ports

September 4, 2015 | 04:31 PM - Posted by funandjam

The TechReport just released this video, and about 1hour 12min into it, Scott Wasson and David Kanter discuss asynchronous Compute:

Looks like VR is going to have issues to for geforce cards.

September 8, 2015 | 02:21 PM - Posted by funandjam

And even more information regarding Asynchronous Compute and what it means and how it works with Nvidia and AMD cards, very informative:

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.