What's Asynchronous Compute? 3DMark Time Spy Controversy


Yes, We're Writing About a Forum Post

Update - July 19th @ 7:15pm EDT: Well that was fast. Futuremark published their statement today. I haven't read it through yet, but there's no reason to wait to link it until I do.

Update 2 - July 20th @ 6:50pm EDT: We interviewed Jani Joki, Futuremark's Director of Engineering, on our YouTube page. The interview is embed just below this update.

Original post below

The comments of a previous post notified us of an thread, whose author claims that 3DMark's implementation of asynchronous compute is designed to show NVIDIA in the best possible light. At the end of the linked post, they note that asynchronous compute is a general blanket, and that we should better understand what is actually going on.

View Full Size

So, before we address the controversy, let's actually explain what asynchronous compute is. The main problem is that it actually is a broad term. Asynchronous compute could describe any optimization that allows tasks to execute when it is most convenient, rather than just blindly doing them in a row.

I will use JavaScript as a metaphor. In this language, you can assign tasks to be executed asynchronously by passing functions as parameters. This allows events to execute code when it is convenient. JavaScript, however, is still only single threaded (without Web Workers and newer technologies). It cannot run callbacks from multiple events simultaneously, even if you have an available core on your CPU. What it does, however, is allow the browser to manage its time better. Many events can be delayed until the browser renders the page, it performs other high-priority tasks, or until the asynchronous code has everything it needs, like assets that are loaded from the internet.

View Full Size

This is asynchronous computing.

However, if JavaScript was designed differently, it would have been possible to run callbacks on any available thread, not just the main thread when available. Again, JavaScript is not designed in this way, but this is where I pull the analogy back into AMD's Asynchronous Compute Engines. In an ideal situation, a graphics driver will be able to see all the functionality that a task will require, and shove them down an at-work GPU, provided the specific resources that this task requires are not fully utilized by the existing work.

Read on to see how this is being implemented, and what the controversy is.

A simple example of this is performing memory transfers from the Direct Memory Access (DMA) queues while a shader or compute kernel is running. This is a trivial example, because I believe every Vulkan- or DirectX 12-supporting GPU can do it, even the mobile ones. NVIDIA, for instance, added this feature with CUDA 1.1 and the Tesla-based GeForce 9000 cards. It's discussed alongside other forms of asynchronous compute in DX12 and Vulkan programming talks, though.

View Full Size

What AMD has been pushing, however, is the ability to cram compute and graphics workloads together. When a task uses the graphics ASICs of a GPU, along with maybe a little bit of the shader capacity, the graphics driver could increase overall performance by cramming a compute task into the rest of the shader cores. This has the potential to be very useful. When I talked with a console engineer at Epic Games last year, he gave me a rough, before bed at midnight on a weekday estimate that ~10-25% of the Xbox One's GPU is idling. This doesn't mean that asynchronous compute will give a 10-25% increase in performance on that console, just that there's, again, ballpark, that much performance left on the table.

View Full Size

I've been asking around to see how this figure will scale, be it with clock rate, shader count, or whatever else. No-one I've asked seems to know. It might be an increasing benefit going forward... or not. Today? All we have to go on are a few benchmarks and test cases.

The 3DMark Time Spy Issue

The accusation made on the forum post is that 3DMark's usage of asynchronous compute more closely fits NVIDIA's architecture than it does AMD's. Under DOOM and Ashes of the Singularity, the AMD Fury X performs better than the GTX 1070. Under 3DMark Time Spy, however, it performs worse than the GTX 1070. They also claim that Maxwell does not take a performance hit where it should, if it was running code designed for AMD's use cases.

View Full Size

First, it is interesting that AMD's Fury X doesn't perform as well as the GTX 1070 in Time Spy. There could be many reasons for it. Futuremark could have not optimized for AMD as well as they should have, AMD could be in the process of updating their drivers, or NVIDIA could be in the process of updating their drivers for the other two games. We don't know. That said, if 3DMark could be more optimized for AMD, then they should obviously do it. I would be interested to see whether AMD brought up the issue with 3DMark pre-launch, and what their take is on the performance issue.

As for Maxwell not receiving a performance hit? I find that completely reasonable. A game developer will tend to avoid a performance-reducing feature on certain GPUs. It is not 3DMark's responsibility to intentionally enable a code path that would produce identical results, just with a performance impact. To be clear, the post didn't suggest that they should, but I want to underscore how benchmarks are made. All vendors submit their requests during the designated period, then the benchmark is worked on until it is finalized.

View Full Size

At the moment, 3DMark seems to oppose the other two examples that we have of asynchronous compute, leading to AMD having lower performance than expected, relative to NVIDIA. I would be curious to see what both graphics vendors, especially AMD as mentioned above, have to say about this issue.

As for which interpretation is better? Who knows. It seems like AMD's ability to increase the load on a GPU will be useful going forward, especially as GPUs get more complex because it doesn't seem like the logic required for asynchronous compute would scale too much in complexity with it.

For today's GPUs? We'll need to keep watching and see how software evolves. Bulldozer was a clever architecture, too. Software didn't evolve in the way that AMD expected, making the redundancies they eliminated not as redundant as they expected. Unlike Bulldozer, asynchronous compute is being adopted, both on the PC and on the consoles. Again, we'll need to see statements from AMD, NVIDIA, and Futuremark before we can predict how current hardware will perform in future software, though.

Update @ 7:15pm: As state at the top of the post, Futuremark released a statement right around the time I was publishing.

July 19, 2016 | 06:59 PM - Posted by John H (not verified)

"Yes, We're Writing About a Forum Post"
+1 for smile.
Keep up the good fight Scott! :)

July 20, 2016 | 02:59 AM - Posted by JohnGR

The whole story about the GTX 970 that ended up showing that it had different specs from those advertised, started of a forum post and a simple CUDA program.

July 23, 2016 | 08:25 PM - Posted by Scott Michaud

Yup. Also, before I got a job at PC Perspective, I would post on forums, too. Many good comments come out of them. That said, if you cannot find much information outside of forums, then it's a good idea to preface it. It could be wrong or too new.

July 19, 2016 | 07:25 PM - Posted by Anonymous (not verified)

Want to see where the software goes for PC? Look no further than Xbox one for your cues...

July 19, 2016 | 07:32 PM - Posted by Anonymous (not verified)

Basically Nvidia pipelines are too long to effectively handle parallel async because the latency is too high so 3dmark made a tailored version of async that only benefits nvidia. Am I getting this right?

July 19, 2016 | 08:08 PM - Posted by arbiter

no its just AMD fans looking for something to cry foul with. Not everything works perfectly on their cards so they cry foul and claim biased.

July 19, 2016 | 09:06 PM - Posted by Anonymous (not verified)

lol.... right.

July 20, 2016 | 11:04 AM - Posted by Anonymous (not verified)

there...there... now. let them cried! Just go back to your dream world. :)

July 19, 2016 | 07:40 PM - Posted by Anonymous (not verified)

You guys in the tech press keep on making the same mistakes and fail to explain Async Compute properly. In DX12/Vulkan, it's referred to as Multi-Engine.

An easy to understand video:

July 19, 2016 | 08:30 PM - Posted by Scott Michaud

That entire video actually aligns with what I said, except the last slide that claims pre-emption is not asynchronous compute.

It's definitely not what AMD defines it as, and you could make an argument that it could be misleading to define pre-emption as asynchronous compute, but definitions can't inherently be wrong. You just need to be careful when explaining why you define something in the way you do. This is basically the point of the first 2/3rds of the post.

July 19, 2016 | 08:35 PM - Posted by Anonymous (not verified)

Why does the 970 and Fury X have 2 device context while 1080 has 4 while supposively async is turnned on.

On the async off screenshots both 970 and Fury X have 1 while 1080 has 2.


July 19, 2016 | 11:07 PM - Posted by Anonymous (not verified)

Asynchronous compute in hardware is achieved through preemption at the hardware level, context switching between processor threads on SMT(Hardware based) enabled processor cores. One processor thread stalls waiting on a dependency is preempted by the hardware scheduler and the other processor thread is context switched in after the stalled thread is context switched out. The processor's execution pipelines are managed by the hardware to allow for more than one processor thread to operate a single logical core.

Then there is the software kind of preemption where the OS needs to preempt one software task/software thread/context and perform a higher priority task. All the modern OSs are preemptive multi-tasking OSs. Even on the application level there can be multiple software threads spawned by the parent task, and the individual software threads can have software/hardware Interrupt events delegated to the spawned threads. Windows uses an event driven OS/Application model as do other OSs. All the hardware drivers on a modern PC operate in an event driven mode by hardware and software interrupts.

The big argument is not about defining asynchronous compute itself(hardware kind/Software kind), it's about having that asynchronous compute(hardware kind) fully implemented in the GPUs hardware down to the processor core level of hardware management of the processors on a GPU. GPUs are big networks of parallel processors that are grouped into units that, in AMDs case, have their core execution resources/pipelines also managed by hardware based scheduler/dispatch units and are also managed by the group by that hardware ACE units and hardware schedulers. This requires more hardware to manage but its makes for better core execution resources utilization and lower latency response to asynchronous events. ON Polaris there is even instruction pre-fetch too further enable the more efficient usage of the CUs' execution resources to try and stay ahead of things.

What there is lacking in the reviews on most GPUs is the same level of per core evaluation of a GPU's cores and their differences between the different GPU makers as there is between the relatively few core that the CPU makers have and the differences between the different CPU makers' cores. Both AMD's and Intel's x86 cores are analyzed down to the smallest details but for GPUs reviewers are not going down as deep with the GPU makers new cores as they continuously do for the CPU makers cores. Maybe there should be more single GPU core specialized benchmarks to sniff out any single core deficiencies between the GPU makers cores, and single units of many cores like CUs and SMPs. Hell on CPU cores they count instruction decoders per core, reorder buffer sizes per core, ALUs, FP units, INT units, etc.

Most of the time if a GPU's core/s get a deep dive analysis, its the top tier accelerator GPU/core that is analyzed but not the derived consumer version that may or may not have the exact same feature sets enabled. One thing is certain this time around is that there is insufficient benchmarking software, and the entire gaming ecosystem is just beginning to switch over to the revolutionary changes that have just happened with the new graphics APIs. And the GPU makers are just now using some newer/smaller chip fabrication nodes.

For sure in time there will be plenty of graduate and post-graduate academic research papers on the new GPU micro-architectures for both AMD's and Nvidia's new GPU accelerator products and AMD will be reentering the GPU accelerator market with its Vega line of accelerators, and its new Zen/Greenland/Vega APU's on an interposer module so there will be plenty of white papers and other research material to put the asynchronous compute argument to the test for both the GPU makers.

July 20, 2016 | 05:31 AM - Posted by arbiter

"It's definitely not what AMD defines it as, and you could make an argument that it could be misleading"

What are the chances that what they define it as changes based on what works for THEIR hardware alone. I bet it wouldn't be same if it works good anywhere else.

July 20, 2016 | 06:55 AM - Posted by JohnGR

I like how you twist reality.

Let's make it more clear for the others. You will prefer to brake your keyboard before writing something not positive for Nvidia.

Nvidia's optimizations: GameWorks, PhysX.

Results: Better performance in latest Nvidia series cards.
Much worst performance on competing cards. Questionable performance on older Nvidia series cards.

Example. Project Cars. Full of PhysX code. Dreadful performance on AMD cards. Good performance on Maxwell cards. Questionable on older Nvidia cards.

AMD's optimizations: Better performance on GCN cards. No effect on performance on competing Nvidia cards.

Example. Doom. Excellent performance on AMD cards. Same excellent performance on Nvidia cards. NO performance loss on Nvidia cards, compared to OpenGL.

One more thing to take in consideration. How companies use their exclusive techs or hardware advantages?

Nvidia: As tools to create a closed ecosystem keeping competition out. PhysX and GameWorks are closed and proprietary. PhysX and CUDA are DISABLED in case an AMD card is in the system. Nvidia forces it's own customers to NOT use a combination of cards.

AMD: As tools to promote performance. Open, not closed. Vulkan - that is Mantle - runs on Nvidia cards and it is as optimized as Nvidia's DX11. Before Vulkan, Mantle wasn't disabled by the driver if an Nvidia card was present in the system. AMD doesn't punish customers that are not absolutely loyal to the company.

One more example.
GSync. More expensive. Only compatible with Nvidia.
FreeSync. Open. Anyone can use it. Much cheaper.

July 20, 2016 | 06:17 PM - Posted by arbiter

I Twist reality isn't really a twist when its the truth. Welcome to Real Life.

July 21, 2016 | 07:33 AM - Posted by JohnGR

One more empty post that says in fact nothing.

July 21, 2016 | 01:26 PM - Posted by Stefem (not verified)

John, please, stop spreading fud,this post is full of bullshit

Just to name few, you refer at GameWorks as "optimizations" Really? I thought where added effects and simulations...

You mention not better specified "AMD's optimizations" (but I think you are talking about asynchronous compute) as would not negatively impact non GCN architecture which is not true, benchmark speak themself and I'm sure I can find a post where you say Maxwell or even Pascal suffer performance degradation once asynchronous compute is enabled.

Project Cars: Full of PhysX? none of the PhysX simulation being done are run on GPU, or you can prove otherwise?

You argued that PhysX and GameWorks are closed wile source code is actually available on GitHub.
And no, CUDA is not disabled if there is an AMD GPU in the system, only PhysX acceleration (not the whole engine as you said which would be no brainer) was and I'm not sure if still

If AMD is all that open and friendly why they had not designed mantle inside the Kronos group? and why its vice president said in a tweet that with Mantle and console they tried to play a gambit to NVIDIA?

And you accuse other of twisting?

July 21, 2016 | 04:56 PM - Posted by JohnGR

About GameWorks. Is it or not libraries for effects and simulations as you are saying? So if they are, do you believe they are just a pile of code that isn't optimized for a specific architecture? You are full of BS if you say yes.

About AMD optimizations. Name ONE where Nvidia's owners will have to suffer lower performance or lower visual quality. You will find NONE. Nvidia cards can run a game with or without async. They can run a game at DX12 or DX11 without any change in performance or visual quality. On the other hand, for example with PhysX, you had only one choice. Either lose visual quality, by choosing lower settings for physics effects or seeing ridiculous performance drops by choosing a high setting. Before you say anything, until GeForce 320 drivers you could enable PhysX alongside AMD GPUs with patch. Nvidia LOCKS PhysX.

About Project Cars. Just use google.

Nvidia opened some GameWorks libraries only recently. They don't say if they will open any newer versions of GW libraries.

PhysX is locked on Nvidia GPUs. Installing an AMD GPU, or a USB monitor, will disable PhysX. USB monitor's drivers are treated by Nvidia's locking system as competitor's GPU in the system. PhysX could be selling millions of Nvidia's graphics cards as physx co processors. Nvidia instead choose to use physx as a way to create a close ecosystem where it's main competitor, AMD, will be locked out. On the other hand Nvidia had the opportunity to optimize TressFX in less than a week. When TressFX came out Nvidia GPUs where having problems, but thanks to AMD's open nature, Nvidia was able to produce a driver in less than a week, a driver that was bringing Nvidia GPUs at the same level as AMD GPUs. Can you spot the difference?

As for Mantle, it was given to Khronos group. What the hell do you think Vulkan is? And Nvidia cards running Vulkan see NO performance or quality drops.

Try not to have objections to my avatar. Try to see my text without looking at the same time to my avatar. you look at my avatar and instead of trying to understand what I am writing, it seems that you are trying to find reason to say "you are wrong" to that avatar.

July 21, 2016 | 01:59 PM - Posted by Stefem (not verified)

A video made by who? has a pertinent technical background? you can find videos where is said that men hasn't ever been on the moon, that an alien reptiles race is governing us and even some which confused a game ad campaign as prove of the existence of angel...

July 19, 2016 | 07:46 PM - Posted by Anonymous (not verified)

Mahigan is a well known AMD shill troll asskisser & a liar, amazing that AMD morons take his lies as gospel truth.

July 19, 2016 | 08:14 PM - Posted by Anonymous (not verified)

Even FM has said its not a complete DX12.

If you have non-compliant hardware there is no other choice then to revert to the lowest common denominator and in this case happens to be Nvidia. You can implement the CPU advantages of DX12 but not the GPU since your try'n to be fair to both vendors.

July 19, 2016 | 11:46 PM - Posted by Anonymous (not verified)

That would be known as gimping the games/benchmarks down to placate Nvidia and holding gaming/hardware innovation hostage to Nvidia's business model. This is a prime example of why there needs to be enough unhindered competition in the consumer GPU and CPU market.

True fairness would require that any benchmarking software be above reproach and if there are any indications of favoritism regarding any benchmarking maker's product then things need to be looked at from a regulatory perspective, including any violations of rules/regulations/laws already on the books FTC/other agencies. Benchmarking software should properly test all of a GPUs hardware and make no deference to accommodation for any GPU maker's lack of innovation or lack of innovative features.

Hell, I do hope that SoftBank takes the ARM Mali/Bifrost GPU micro-architecture and makes some laptop grade discrete mobile GPU offerings, and that Imagination Technology can find a backer(Apple/other) that could do the same for its PowerVR/PowerVR Wizard(Ray Tracing hardware Units) GPU designs.

Nvidia is just one big monopolistic product segmenting interest that needs even more competitors. AMD is pouring all of its profits into R&D and innovation for its line of GPUs, including years of HBM R&D investments! And AMD still has more concern for retaining other features in its GPUs that make them good at other uses besides only gaming usage. Even Nvidia will be get a net benefit from HBM/HBM2 that is the result of AMD and AMD's HBM memory partner's R&D efforts. VR gaming and AMD's investment in asynchronous compute enabled GPU hardware is going to benefit that entire GPU market place, just look at what the mobile GPU makers are doing along with AMD at that HSA foundation, and with the Khronos Group for Vulkan's development.

July 19, 2016 | 09:11 PM - Posted by Anonymous (not verified)

Oh look, someone whining that someone else in the internet is a shill and liar without any evidence whatsoever.

Surely you're not doing the same thing as they are.

Everyone will just magically believe you now.

July 20, 2016 | 12:25 AM - Posted by Anonymous (not verified)

"Don’t toggle between compute and graphics on the same command queue more than absolutely necessary
This is still a heavyweight switch to make"

That tell us nvidia implementation Asynchronous Compute

July 22, 2016 | 10:34 PM - Posted by Anonymous (not verified)

This comment is definitely not a character attack to cover up a complete lack of ability to disprove a claim.

This comment is definitely not an ad hominem fallacy.

This comment is definitely not a pile of hypocritical garbage written by someone too cowardly to stand by his accusations.

Anonymous (not verified) is definitely not an Nvidia shill troll asskisser & a liar.


July 19, 2016 | 08:54 PM - Posted by Anonymous Nvidia User (not verified)

Wow. I guess 3dmark really laid the smack down to AMD trolls. And with next day service too. LOL Gotta love the graphs showing compute packets in the queues.

July 19, 2016 | 09:09 PM - Posted by Anonymous (not verified)

Sure, that's why Time Spy uses 21% hardware compute while Doom is at 40% and AoTS at 90%.

Petty little kid.

July 20, 2016 | 07:52 AM - Posted by Anonymous Nvidia User (not verified)

If AOTS is really 90% async then there would be negative performance on even AMD cards. AMD's opengl drivers are so horrible that is why they get huge bump using Vulcan for Doom. The increase from async is still around 10%.

July 21, 2016 | 02:37 PM - Posted by Stefem (not verified)

And still AMD has more advantage in Doom than in AoTS, something in your logic is flawed...

Also, what are those percentage about? 90% of what, rendering time? hard to believe...

July 24, 2016 | 06:50 PM - Posted by Anonymous Nvidia User (not verified)

Rendering time. Hmm. Doesn't favor AMD in the slightest with their ungodly number of shader cores compared to an Nvidia card.

What advantage in Doom? Last I checked the top spots were 1080 and 1070. Did that change? Doom is new. Definitely more popular than AOTS (saleable) so worthy of AMD optimizing for it. No surprises here.

Power consumption of AMD video cards increases dramatically under AOTS. Don't know if it's a dx12 phenomenon or limited to AOTS. It's utilizing the card to it's fullest right along with it's max TDP. 390x went up 122 watts under dx12 with async compared to Dx11. You can have your 10% "free" async performance for 122 extra watts. Wattage went up 58% from dx11.

July 20, 2016 | 11:22 PM - Posted by Ty (not verified)

They did nothing but blow smoke and make excuses for why they chose not to make a best effort to optimize performance for different vendor gpus. Neutrality in rendering does not work in dx12, cards have different capabilities.

If they had a benchmarks that used some sort of effect that got a speed bump with conservative rasterization, it would be perfectly legitimate to add that effect in and let the nvidia cards support for the feature boost performance while the amd card had to take slower methods to get the same effect.

Same goes for rendering a scene, if an amd card can handle a more complex method and mix of work concurrently, then the game or benchmark should not hold that back just because nvidia can't complete the task in the same way. WTF is the point of holding back? We want to see what the cards can do, what EACH can do, not the lowest common denominator.

July 19, 2016 | 09:04 PM - Posted by Anonymous Nvidia User (not verified)

Scott you seem to forget that Timespy is a benchmark designed to test graphic capability of video cards in dx12 and not for measuring async compute. The two examples we have are AMD biased being : AOTS and Vulkan does not properly support Pascal and async for it. Basing Furyx performance on games that are tailored to it being shader heavy does not all dx12 games make.

Geometry performance matters as well. Nvidias have that in loads. What they don't have is a bunch of inefficient shader cores lying around waiting to be given work.

July 20, 2016 | 04:12 AM - Posted by Anonymous (not verified)

Then why do we need this test altogether, if it's main target is Async? Which is combining 2 different calculating powers to utilize gpu to it's max.
And considering results in games you can't not compare them to results 3DMark. And something stinks, oh it's stinks. The cards showing real perfomance with async are way behind.
It's either that AMD still don't have a driver or NV has too much $$$.

July 19, 2016 | 09:05 PM - Posted by StephanS

Its interesting to see the WIDE difference betwen game and architecture/driver.

Doom is currently 32% faster on a RX 480 then a GTX 1060 at 1440p
But the battelfield4 is 30% faster on a GTX 1060 then a RX 480

The difference is MASSIVE.

And we also see that with many other games. Where ine one the RX 480 crush the 1060 by 20%, and in another the 1060 crush the rx 480.

But here is something of BIG importance... Thermal & power limits.

The reference RX 480 is hitting a wall with its dinky heatsink,
combined with its stock 'overvolting' preset.

Its not an architecture problem.

When you get faster benchmark result when you lower you clock speed, you know that you are NOT measuring core GPU architecture performance. but you are compare card cooling, and other factors.

To truly compare ASYNC on/off performance, we also need to look at sustan clock speed. Because its possible thta with async ON, core clock is dropping....

There is more to this analysis. Look at power/heat/boost clock

July 20, 2016 | 05:30 AM - Posted by arbiter

"Doom is currently 32% faster on a RX 480 then a GTX 1060 at 1440pBut the battelfield4 is 30% faster on a GTX 1060 then a RX 480
The difference is MASSIVE."

Another thing Massive would be a person being an idiot buying a gtx1060 to play 1440p to start with. its a 1080p card at most.

July 20, 2016 | 06:39 AM - Posted by John H (not verified)

These cards are nearly GTX 980 performance .. a card which ran well at 1440p. Rx 480 maxes out doom at 1440p (60fps).

July 19, 2016 | 09:41 PM - Posted by Anonymous (not verified)

I don't see why people even bother with these benchmarks. They are always going to be skewed. How a game performs is going to be dependent on how much work the game or engine developers put in for each different architecture. Nvidia has a large installed base that will get targeted, but with completely unusable asynchronous compute abilities in previous generation Nvidia devices, I don't see these getting much developer optimization. The 1070/1080 architectures will see some optimization, but developers may not take the time to optimize it as much as what 3DMark has done for this benchmark unless it is an Nvidia sponsored game. It comes down to look at what games you want to play, and see how they perform. For future games, AMD parts seem to age much better than Nvidia parts, although Nvidia has often had the lead at release time. This may change though, since games may be highly optimized at launch for AMD parts due to work on the console versions once we get engines developed from the ground up for DX12 parts.

With AMD parts, you are almost guaranteed to get very good optimization since developers need to do that optimization for the Xbox One and PS4 versions anyway. I am hoping AMD can get some Zen/Polaris APUs for laptops out in a reasonable amount of time since I need a new machine. These should perform very well for games; they will have almost the same architecture as the consoles. Also, once we get 14 nm AMD graphics against 14 nm Intel graphics, the AMD parts should leave Intel graphics even farther behind.

July 20, 2016 | 11:29 PM - Posted by Ty (not verified)

It looks like more engines are stating to add more advanced support for the feature sets amd is able to take advantage of.

Nitrous/frostbite 3 (will be interesting to see battlefield 1 dx12)/idtech 6/glacier 2 (hitman)/deus ex mankind divided (dawn engine - modified glacier 2)

Some of the bigger holdouts seem to be the witcher series, hopefully they get a page one rewrite for their engine to be more modern for cyberpunk 2077, UE4 (nvidia focused) Though I find it interesting that many of the cross platform games are not using unreal and releasing their own engines, probably in part because unreal does not seem to give a damn about modifying the engine to make better use of gcn on consoles. This will hurt their adoption, because in the irony of ironies, console focused engines actually help amd going forward.

Let's not even start on ubisoft with AC unity, they REALLY need to ditch that engine and just use glacier 2, same with any future batman game.

July 19, 2016 | 09:52 PM - Posted by Anonymous (not verified)

If AMD would make cars, then they also would build special roads for this cars and then proclaim: If you drive our cars on our roads you will be faster than the average car on an average road.

Whereas NVIDIA is like: Buy our cars and you will be fast on any road.


July 20, 2016 | 12:17 AM - Posted by Anonymous (not verified)

I think the more accurate analogy would be, Nvidia builds special roads for their cars that make Nvidia's cars go faster, but AMD's cars are banned from driving on them, and any AMD car that tries gets its tires shredded.

Then someone figures out a way to let an AMD car and an Nvidia car drive together on the same road, so Nvidia remotely shuts off the engine of any Nvidia car driving next to an AMD car.

Then Nvidia takes over 80% of the gas stations in the country and starts offering a special gas that makes Nvidia cars run faster and makes AMD's cars run like ass - oh, and that's the only gas you can buy.

Then AMD builds their own roads that make their cars run a whole hell of a lot faster, and Nvidia cars can drive on them too and go faster, only the Nvidia drivers all bitch and moan and cry because AMD cars get a much bigger boost in speed, and they all dismiss the AMD roads offhand, claiming that the old roads are so much better.

Ya know, if accurate analogies are important to you or something.

July 20, 2016 | 12:20 AM - Posted by Anonymous (not verified)

Oh, also, don't forget when Nvidia managed to rig some of the roads so that they would really hobble AMD cars, but barely slow down Nvidia cars, if at all. And when AMD figured out how to prevent it, all the Nvidia drivers cried and said they were cheating.

July 20, 2016 | 02:50 AM - Posted by Anonymous (not verified)

Say something against NVIDIA and nobody bats an eye.
Say something against AMD and everybody loses their minds.

Let me put it this way:
Nvidia has always been better getting the best out of their hardware by always strongly optimizing their software for users and developers. And making it as user-friendly as possible from the beginning.
Now they are collecting the fruits of that.
Whereas AMD never did reach the same level of software-optimization; instead they leave the full work to be done by the game developers.
But I'm quite sure Nvidia will adopt hardware asynchronous computing in the next generation GPU, if they think it's necessary.
It also would be nice if they supported Freesync, as it's an open standard. I don't think their proprietary implementation has a chance to survive in the future.
And of course by then there will be plenty of games with DirectX 12 Support, which will change everything.
At the moment for most(!) users it's just like nice to have. As they even don't have any idea what it is.

July 20, 2016 | 12:11 PM - Posted by Anonymous (not verified)

I'll bet you are quick to defend M$ and Comcast also! Nvidia is the big GPU interest and people see how Nvidia overcharges for their hardware and gimps the SLI on the GTX 1060! And to get at GP100's finer instruction granularity in software requires CUDA, but who knows if any of GP100's improvements are in GP104 or GP106. Nvidia is sure not going to be able to gimp things for any GP100 based HPC/Server SKUs and Nvidia better have some OpenCL compatibility.

No one likes a monopoly interest that further segments its consumer product offerings so Nvidia's bad karma has come back in the form of Vulkan and DX12 giving the games developers the ability to manage the GPU’s hardware resources in a close to the metal fashion. Nvidia is the control freak of the GPU world, and the great product segmentation specialist and that’s what has caused many to express their dislike of Nvidia in many online forums. It will be very good for the entire GPU market in general, including Nvidia’s users, if AMD takes more market share as that will force Nvidia to invest more in R&D and get off of its ego trip and get that asynchronous compute fully implemented into its GPU’s hardware, and that includes Nvidia’s consumer SKUs.

July 20, 2016 | 06:20 PM - Posted by arbiter

Truth is the truth no matter what direction you shine the light on it. AMD does seem to get the pass from the people over and over where as nvidia does something wrong people wants heads to roll.

July 21, 2016 | 06:44 PM - Posted by Anonymous (not verified)

Go work public relations for Comcast, Intel, and M$, And the green goblin gimpers at Nvidia! you love your big monopolies! Nvidia is a GPU monopoly that works with its willing game partners to use software/middleware to lock out any fair competition!

No one liked Ma Bell or the Standard Oil Trust back in the day also, and folks see how Nvidia segements its product lines to milk for excess profits the many fools like yourself. Those dual RX 480s and some Vulkan will Doom Nvidia's attempts at cornering the GPU market!

July 21, 2016 | 02:48 PM - Posted by Stefem (not verified)

"Gimp", the most beloved word by AMD fans...

I don't think anyone can claim NVIDIA don't spend or invests little in R&D, especially looking at their expenses report.

Just a question, who is lacking behind with product performance? NVIDIA? maybe Intel? or maybe AMD?

July 19, 2016 | 10:00 PM - Posted by Anonymous (not verified)

What there needs to be is a benchmark that can measure underutilized execution resources on a GPU. If there is a way to measure/benchmark accurately any idle GPU execution resources while there is still work backed up in the queues.

There is two types of asynchronous compute, the hardware kind and the software kind. Just look at Intel's version of SMT, HyperThreading(TM), and see the hardware version of asynchronous compute in action on Intel's CPU cores and see why Intel's CPU cores get the extra IPC boost and extra per core execution resources utilization that would go to waste had there not been SMT hardware to dispatch/schedule/and preempt two or more hardware processor threads running on/sharing the same CPU core's execution ports/pipelines.

There is no way in hell that Intel's, or any other SMT based CPU core before Intel's adopting of SMT, could manage a CPU's execution ports/pipelines in software, as software is not fast enough to react to the changing states inside a CPU's or any other processor's execution pipelines.

GPU's are no different from CPUs in this respect for hardware based SMT like/asynchronous compute scheduling of multiple processor threads on a single core unit(Instruction decoding/instruction scheduling/dispatching of hardware processor threads, etc.) So no software scheduling of a processor's core execution resources among shared hardware processors threads is ever going to be able to keep the processor's core execution pipelines utilized in a fast enough manner to avoid the need for execution wasting pipeline bubbles(NOPs) to be inserted if one processor's thread stalls and another needs to be quickly started up to make use of the processor's execution pipelines. These execution pipelines are operating at a faster state change than even a single op code instruction takes to be fetched/Decoded and scheduled/dispatched on a processor's execution pipelines(FP, INT, etc.)

Sure software can manage to a degree the scheduling of deterministic workloads on a processors' core, but no software solution can take up the slack if a non deterministic asynchronous event occurs that requires an immediate stopping of work on the current processor thread's workload and the context switching and scheduling/dispatching of another thread's execution on the same processor core. A lot of the hardware based asynchronous compute on a processor core happens at below the single instruction time interval and that requires specialized in the hardware asynchronous compute units/engines. Any single core processor without any SMT ability in hardware is not going to be able to make an as efficient utilization of the processor's execution ports/pipelines and there will be plenty more NOPs/Pipeline bubbles if the single processor thread stalls, or that single processor single processor thread is preempted and needs to be context switched out and another higher priority thread needs to be context switched in and worked on.

So what is needed is benchmarking software that can count the pipeline bubbles and that is a very tall order to achieve short of having access to the unpublished Instructions on any processor that the processor maker has to test the cores on its products. Processor have been built all along with undocumented instruction to allow for that processor maker to single step through not only the single instruction, but also look at the pipeline stages etc. There are ways to infer some things with specially crafted processor assembly code from a processors compiler optimization manual to measure some efficiencies, but short of having some proprietary information it is very hard to get any deeper inside some of the testing mode instructions that all processor makers build into their processors.

Nvidia is very good at managing in software its GPUs execution, but there is always some more idle execution resources as the cost for managing hardware core execution resources with software, especially when there are lots of non deterministic events happening and no readily available cores to schedule the work to, things get queued up fast. There is also the inherent underutilization of core execution resources because software is never going to be fast enough to manage multiple processor threads on a single core and keep the pipeline bubbles to a minimum. Those processor pipelines need to be managed by hardware based asynchronous compute units.

July 19, 2016 | 10:26 PM - Posted by alucard (not verified)

It's a bit odd that the article is using Javascript as analogy - I guess Scott is a web developer?

But anyway, the whole "conspiracy" is completely overblown. Futuremark never claimed that Time Spy benchmark is an async compute benchmark. It's just a dx12 benchmark that uses all dx12 features which happens to include async compute.
If there are little to no compute tasks to parallelize, or if it's bottlenecked by graphics workload (which it typically are, on synthetic benchmarks), then obviously the benefits gained from going async is going to be smaller.

July 19, 2016 | 11:54 PM - Posted by Anonymous (not verified)

Yes Java script/other script is about as far away as one can get from the real metal on any processor(CPU, GPU, Other)

It's too bad that Anand Lal Shimpi is no longer writing for AnandTech. That would be a great review for an outside of a pay-walled publication. Just wait until AMD gats back into the HPC/Server accelerator market, then there will be some really good benchmarks to test things really thoroughly.

July 19, 2016 | 10:55 PM - Posted by Anonymous (not verified)

hey scott, this is pretty weak analysis. can you interpret gpuview and explain it in laymans terms?

July 20, 2016 | 03:05 AM - Posted by JohnGR

Thanks for your time and the article Scott Michaud.

July 20, 2016 | 04:56 AM - Posted by Anonymous (not verified)

Another important distinction to make is between Asynchronous Compute and Asynchronous Shading. Asynchronous Shading is one way to IMPLEMENT Asynchronous Compute, but it is not the ONLY way. You can quite happily schedule calls at the driver level to pack them efficiently for execution, just as you can implement that scheduling in hardware. Unless you're hitting a CPU bottleneck (and when has THAT happened recently?) both GPUs will be executing equally well packed instruction calls. Doing the rescheduling in software has the advantage that you can apply rescheduling to workloads that are not explicitly designated, and that you can change scheduling criteria if required (e.g. some engine does something weird with it;s dispatch) without a hardware revision. The downside is all the coding effort to get the software scheduling to work, and the potential for a CPU overhead.

July 20, 2016 | 12:40 PM - Posted by Anonymous (not verified)

Tell Intel to schedule its CPU processor threads in driver software, and see how Intel's version of SMT, HyperThreading(TM), would NOT be able to keep its CPU core’s instruction execution pipelines properly and efficiently utilized. And a shader core with a fully in hardware based version of asynchronous compute on the shader is no different from a CPU in this respect.

There is no way that Intel’s HyperThreading(TM) could be adequately managed by software, when on Intel’s CPU cores the pipeline states change faster than any single software instruction(let alone the many instructions in most driver code to manage the most simple functionality) could manage any quickly changing hardware asynchronous events like a pipeline stall, and the processor thread context switch that has to be managed as quickly as possible or the execution pipelines would sit idle executing NOPs(No Op instructions). Those execution pipeline bubbles(NOPs) would be very numerous if Intel did not manage its CPU core’s asynchronous compute fully in its CPU’s hardware.

July 21, 2016 | 03:10 AM - Posted by Anonymous (not verified)

"Tell Intel to schedule its CPU processor threads in driver software, and see how Intel's version of SMT, HyperThreading(TM), would NOT be able to keep its CPU core’s instruction execution pipelines properly and efficiently utilized. "

Uh, Hyperthreading does indeed rely on software scheduling of jobs. Without it, it does nothing of worth (and if your OS handles things badly, even makes performance worse as seen with the Bulldozer scheduling issue). Instruction Level Parallelism is done in hardware, but it is done in hardware on GPUs of every architecture too.

Hyperthreading exposes a single physical core as two logical cores. But in order to use those two logical cores, you need to feed it two logical threads, and the scheduling for that is done in software, not in hardware. If you feed a hyperthreading core a single thread, it will be underutilised, because it can only do so much at the instruction level to parallelise workloads without ending up with large parts of the core sitting and spinning waiting fo the rest of the thread to catch up.

The entire POINT of Hyperthreading is to allow another SOFTWARE SCHEDULED thread to be pointed at that core to utilise those parts of the core that are UNDERFED with Instruction Level Parallelism alone!

July 21, 2016 | 08:52 AM - Posted by Anonymous (not verified)

"Without it, it does nothing of worth (and if your OS handles things badly, even makes performance worse as seen with the Bulldozer scheduling issue)."

To clarify, I mean that without software scheduling accounting for it you can have issues with actually achieving theoretical efficiencies. With Bulldozer, the issue was not SMP but the heterogeneous distribution of FPUs, where assigning two threads to two core sharing an FPU was less performant than assigning each of those threads to two cores not sharing an FPU.
For SMP, when NOT in a saturated condition it would be more performant to assign threads to separate physical cores, and only start assigning to logical cores sharing physical cores once this is saturated. Scheduling for power efficiency would be the opposite condition.

July 21, 2016 | 08:22 PM - Posted by Anonymous (not verified)

Your hair splitting won’t work SMT/HyperThreading(TM) is fully on the hardware level on Intel's CPUs just look at the diagrams that Intel provides for its version of SMT. The hardware scheduling on Intel's version of SMT is done with specialized fully in the core's hardware schedulers/dispatchers and the processor threads are fully managed by the hardware. You are confusing the OS/Software kind of threads with the hardware kind of threads that are not "software" based and operate at the sub instruction level on Intel's and others’ CPU cores. Software threads can be infinite in numbers to a reasonable degree, but processor threads in the hardware are limited by the hardware on Intel's consumer SKUs to 2 logical processor threads per physical core.

Intel’s SMT/HyperThreading(TM) operates on the decoded instructions below the Native assembly level of the X86(23/64 bit) ISA as it is implemented on Intel’s microprocessor cores. You can not wrap your mind around the differences between the software thread and a processor thread abstract concepts in computing. You need to be reading some CPU deep dive primers into the hardware concepts surrounding SMT(Simultaneous MultiThreading) as it is done in hardware.

The OS only tasks the processor with a Software Thread(single stream of native code instructions per logical core) but the OS does not manage on the core that stream of instructions. That management of the software threads after they are tasked by the OS is the job of the hardware scheduler/dispatcher in the processor’s core/s and the OS is not aware of any of that level of work happening on the CPU’s core/s. In fact on Intel’s SMT enabled SKUs the OS does not even Know that the two logical cores are in fact being hardware Simultaneously Multi-Threaded on a single CPU core, the threading being done fully in the CPU’s core is only known to the hardware on that CPU and the OS is none the wiser. You do know that on Intel’s CPU cores that are also SMT enabled that some instructions can be scheduled and executed out of the logical order in which that came into the logical core and dispatch/scheduled by the hardware units to keep the execution pipelines occupied and utilized at as close to 100% utilization as is possible, the OS/software side never sees this hardware side or plays a role in the management of the out of order execution on Intel’s SMT or OOO enabled cores.

The OS only passes the address in memory of the first instruction, or pushes the address onto the top of the stack, etc. for the CPU to get to work and the CPU core takes it from there. The CPU core does all the fetching, decoding and instruction scheduling/dispatching, instruction reordering for out of order execution, and branching/branch predicting, and speculative execution. And that work includes the hardware asynchronous processor thread management to work the two logical core processor threads on a single processor core.

Furthermore the OS itself, and the drivers, and the applications, and APIs are all made up of native code that is running on the CPU/cores, so how would any software made up of single assembly language instructions be able to manage any hardware that operates on instructions at the sub assembly language level on execution units and execution pipelines that change states faster that even the system clock. You do know that on Intel’s/Others CPU cores there are specialized hardware units that operate at a clock multiplied rate to the main core clock in order to manage the execution pipelines, and other scheduler/dispatcher units that need to manage the execution pipelines state changes and stay ahead of the instruction streams to keep the CPU core’s execution pipelines efficiently utilized. The same rules apply for a GPU’s many more cores that need to manage their asynchronous compute fully in the GPU’s hardware on GPUs with the hardware that enables such work(ACE units/Hardware schedulers and hardware asynchronous shaders, etc.)

The OS passes the work to the CPU/Processor but the OS runs on the CPU/Processor so the OS can not run without the help of the processor, the processor runs the OS's code, and the processor even runs the code before the OS itself is even up running on the CPU’s/Processor’s cores. You need to bootstrap your mind to load up a proper understanding of just what the difference is between processor "threads" and software "threads" are on any type of processor with hardware asynchronous compute abilities.

July 22, 2016 | 07:11 AM - Posted by Anonymous (not verified)

"You need to be reading some CPU deep dive primers into the hardware concepts surrounding SMT(Simultaneous MultiThreading) as it is done in hardware. "

Ironic, as you appear to have missed a fundamental difference between SMT and GPU threading: a SMT CPU is processing two threads using a single core. A GPU is feeding many cores from a single thread. In addition, the way jobs are partitioned for CPUs and GPUs is radically different.

As I stated before, both Intel's CPUs and GPUs, AMD's CPUs and GPUs, and Nvidia's GPUs ALL implement Instruction Level Parallelism in hardware (because that is the only way to feasibly do so). It is job-level parallelism that is the issue with Asynchronous Compute, not instruction level.

July 20, 2016 | 05:59 AM - Posted by Jann5s

The linked futuremark article is extremely interesting, a definite recommendation! Thanks Scott for addressing this topic. The discussion is far from finished, but we need solid reporting from guys like you to arrive at the truth.

July 20, 2016 | 06:48 AM - Posted by Master (not verified)

Looking forward to round 2 when both teams have new drivers out.

PS: Fucking teams everywhere.
- Incomplete coffee addict.

July 20, 2016 | 10:14 AM - Posted by Anonymous (not verified)

Just call it what it is, am nvidia only benchmark.

July 20, 2016 | 11:32 AM - Posted by Anonymous (not verified)

You know what's going to be amazing at AC? GTX 2080 and RX590. LOL

July 20, 2016 | 11:41 AM - Posted by RushLimbaughisyourdaddy (not verified)

Futuremark's response looks like a typical political spin for damage control.

July 20, 2016 | 04:17 PM - Posted by Allyn Malventano

You read an in-depth technical discussion about how/what/why they do, along with confirmation that all GPU vendors (AMD included) are involved in the process, as 'political spin'?

July 20, 2016 | 07:16 PM - Posted by RushLimbaughisyourdaddy (not verified)

I have no problem with someone or a company who chooses to something their own way(in this case how AC is implemented), but the timing of their statement and wording of it reeks of them trying not to be seen as trying to play favorites, that IS political spin for damage control.

July 20, 2016 | 11:43 AM - Posted by No one (not verified)

About Async, one picture and one quote says it all...

Futuremark quote: "The implementation is the same regardless of the underlying hardware."

DirectX12 requires that the developer does the job of choosing and coding which kind of implementation is to be used ( as can be clearly seen on the above picture ). And Futuremark can't dispute this, its part of DX12's core functions ( google it ).

Since there is only 1 implementation ( see quote above ) and Nvidia only supports context switching/preempt. And also by their own admission Pascal do support the implementation they have in TimeSpy.

We can conclude, per their own admissions, that TimeSpy is thus coded for the only method which can run on both Pascal and AMD's hardware without hardware specific paths. That method being preempt, with the 2 other faster methods having no code written for it and thus not being used on AMD hardware.

Quod erat demonstrandum.

Now that we got their bit of sophistry deciphered, you are free to conclude whatever you want.

July 20, 2016 | 04:20 PM - Posted by Allyn Malventano

They were going for how typical game devs would code their games. It's a reasonable assumption that most games will implement those functions that apply to all GPU vendors. If AMD had huge market share then there might be an argument otherwise, but that's not the case.

July 20, 2016 | 08:44 PM - Posted by leszy (not verified)

Not true anymore. Games for XBO are now coded with DX12 for GCN. Only NVidia needs own code path for porting on PC. There is no sense in disabling ready GCN code path.
Even a 20% of the PC gaming market, means few milions people. Futuremark must be filthy rich:)

July 20, 2016 | 11:38 PM - Posted by Ty (not verified)

That's the problem, more devs are starting to use engines that tap into the way amd handles concurrent graphics/compute work.

the idtech guys got those ancient console gcn chips to run doom on a console at 60fps most of the time. They did that by not ignoring the untapped power of gcn. Now it may be true that game devs will be as sloppy and or rushed from the bean counters as the AC unity devs or batman arkham devs, but the more devs that do what idtech did, the better games will run on all hardware. That means optimizing for gcn. If they fail to do this, you will more pretty and garbage performance games like AC unity.

You have seen a better path, and now the time spy guys choose not to follow it. Pathetic.

July 20, 2016 | 06:44 PM - Posted by Anonymous Nvidia User (not verified)

Same reason 3dmark doesn't include physx as standard in their benchmark either. Because Nvidia has hardware and AMD is stuck with software version. Because AMD has special async shaders that give them more performance you feel they should be utilized but not anything that helps Nvidia. Hypocrites.

They always go with the least common denominator. It's a concession that has to be made to test both cards. A benchmark have to have balance otherwise it isn't very useful to test what it tests.

In this case GRAPHIC capability of a card under directx 12 with async compute support as defined by Microsoft and not AMD's implementation of it, async compute with graphics.

I seriously don't know what is hard to grasp about it.

July 21, 2016 | 08:23 PM - Posted by No one (not verified)

You clearly have no idea of what you're talking about. GCN's implementation is TEXTBOOK what Microsoft asked for DX12. Nvidia on the other hand only partially supports what was asked, 1 out of the 3 methods prescribed.

As for PhysX that's a false analogy. PhysX is proprietary and can't be implemented in hardware by AMD. Wherehas the 3 Async methods as prescribed by Microsoft in DX12 are open and anyone can make hardware for it.

Whether you like it or not, Futuremark's explanation that it has chosen to cater to the lowest common denominator because Nvidia didn't support the other 2 methods and the 2 vendors have made adamant that no vendor specific paths be implemented is bullshit.

They had another option, simply implement unbiasedly the full 3 methods as per microsoft's DX12's specifications and let the vendors support what they could, giving preempt boost only to Nvidia's hardware and full Async for AMD's hardware that can support it on the same neutral code path.

And this would've been the most neutral decision, we don't care, we go with the DX12 specifications, you sort it out on your hardware and support it or not, your problem.

So why didn't they go with this most unbiased option instead of the one biased in Nvidia's favor by denying the massive boost to AMD's hardware? There is only 1 logical answer to this, only 1 truth, but you can't handle the truth, so you babble logical fallacy after logical fallacy.

July 24, 2016 | 07:02 PM - Posted by Anonymous Nvidia User (not verified)

You want to talk biased. Timespy is highly AMD hardware biased.

The compute shader invocations went up from 1.5 million to 70 million almost 4667% (46.7x) increase. And 8.1 million to 70 million almost 864% (8.6x) increase.

The tesselation on the other hand went up from 500k to 800k for a 60% (.6x) increase. And 240k to 2.4 million for a 1000% (10x) increase.

Whose graphic hardware is heavy on compute? Well before Pascal anyways.

Maybe we should accuse them of biasing toward AMD. A little piddling 6.1% average worth of async help for Pascal on Nvidia's side isn't going to do much against such gimping.

Oh hold on AMD still ended up with more improvement of 12% average for Fury cards and 8.5% for Polaris Rx 480. WTF are AMD fanboys on. You still get double benefit with older Fury cards and you're still not happy. LMAO

Is this a fallacy then?

July 20, 2016 | 12:00 PM - Posted by Twvalk (not verified)

This is a lowest common denomination approach to asynchronous compute by 3dmark. Completely understandable as they include it as a dx12 feature, not a intentionally in depth test.

Obviously amd will not get the most benifit on something that is not built to test the full capacity of their "metal" deep hardware optimization.
Nvidia would get a better result as well if this was built to use the software side of asynchronous compute more like dx11.

The treating of all asynchronous compute equally is the only way to test fairly. This is after all not a test of the BEST possible result for an optimized platform, but the Most Likely best result over all platforms when generally optimized without the use of software.

July 20, 2016 | 04:14 PM - Posted by Allyn Malventano

That's the big takeaway here. Sure there will be some titles where devs put more work into one side or the other, but the Futuremark guys were going for what they viewed as the 'typical' effort put in by the average developer. Kinda makes sense, because they have to actually put that work in themselves in order to create the test in the first place, so their effort is going to mirror that of a typical game dev.

July 20, 2016 | 07:04 PM - Posted by Anonymous Nvidia User (not verified)

I agree. It's the only way to make sure that the benchmark is largely unfair to both sides.

Besides as Oxide and AMD fanboys point out for Nvidia, AMD had and did offer their input while said program was being developed. Doesn't feel too good from the other side does it. Because most of the new directx 12 games coming out thus far are Gaming Evolved and favoring AMD of course, they feel everything should go their way.

If balanced isn't good enough and Nvidia has at least 70% of the discrete market, should the benchmark not be skewed toward Nvidia. That means it should utilize gobs of tesselation and geometry instead of shader. AMD fanboys would be begging for the old "biased" benchmark back. LOL

July 21, 2016 | 01:19 AM - Posted by Ty (not verified)

Every single cross platform game that runs on consoles will have to be optimized for gcn, every single one if they care at all about increased performance. This will be the same for the updated versions of consoles next year. It is in every single cross platform devs interest to build and modify their engines to better take advantage of amd gpus. Even if amd only had 10% of the discreet gpu market this would still be the case. So stop trotting out that 70-76% marketshare number like it's supposed to mean the bulk of devs should not even bother.

July 21, 2016 | 07:25 PM - Posted by Anonymous (not verified)

Why do I get the feeling that if Futuremark had chosen to implement Async compute in a way that really benefited AMD's architecture and was not at all useful to Nvidia's architecture, you'd be right there on the front lines crying and screaming about biased benchmarks?

July 21, 2016 | 05:25 PM - Posted by pdjblum

Another reader recently pointed out that you are an Nvidia apologist. Your comments here certainly support that allegation. I would think you are supposed to stay neutral given your position and inherent conflict of interest, but you cannot seem to help yourself.

July 25, 2016 | 03:53 AM - Posted by Anonymous (not verified)

Yes, let's wait to make a benchmark that actually tests DX12 until Nvidia has sorted out its DX12 issues.
Lets wait a couple of years before showing to the public that Polaris is the most efficient one until it is not longer competing.
So in two years we will have a Benchmark that show that Polaris was better than Pascal. But at that time no one will care.

The benchmark is tailor made for Pascal. One cannot escape this fact.

July 21, 2016 | 08:26 PM - Posted by No one (not verified)

Whether you like it or not, Futuremark's explanation that it has chosen to cater to the lowest common denominator because Nvidia didn't support the other 2 methods and the 2 vendors have made adamant that no vendor specific paths be implemented is bullshit.

They had another option, simply implement unbiasedly the full 3 methods as per microsoft's DX12's specifications and let the vendors support what they could, giving preempt boost only to Nvidia's hardware and full Async for AMD's hardware that can support it on the same neutral code path.

And this would've been the most neutral decision, we don't care, we go with the DX12 specifications, you sort it out on your hardware and support it or not, your problem.

So why didn't they go with this most unbiased option instead of the one biased in Nvidia's favor by denying the massive boost to AMD's hardware? There is only 1 logical answer to this, only 1 truth, but you can't handle the truth.

July 20, 2016 | 04:41 PM - Posted by Anonymous (not verified)

That's "SLI" but it's really explicit milti-adaptor that is in the API for games developers to use. Just wow a processor under the control of the OS/API, finally, what a joke the microprocessor OSs/APIs have been all these decades to not have been doing this from the start. Just try and gimp that, Nvidia!

"GTX 1060 “SLI” Benchmark – Outperforms GTX 1080 with Explicit Multi-GPU"

July 20, 2016 | 04:46 PM - Posted by Anonymous (not verified)

every thread on this topic is a cesspool of armchair engineers spewing uninformed speculations and spreading misinformation

July 20, 2016 | 06:46 PM - Posted by Anonymous (not verified)

110% agree!

July 20, 2016 | 06:48 PM - Posted by Anonymous Nvidia User (not verified)

How so? Are you a developer from Futuremark? Then you are just as qualified to give your opinion as the next guy. Care to enlighten us with the correct information o wise one.

July 22, 2016 | 10:38 PM - Posted by Anonymous (not verified)

"Care to enlighten us with the correct information o wise one"

Coming from YOU, that's fucking hilarious.

July 24, 2016 | 07:08 PM - Posted by Anonymous Nvidia User (not verified)

Thanks, still waiting on that answer. You seem like you must know it. Right thought as much BS.

July 25, 2016 | 01:13 PM - Posted by Anonymous (not verified)

Well, princess, since you're teeing it up for me, I might as well humiliate you again. You must be some sort of masochist, as you constantly get proved wrong and slapped down again and again, and yet you keep coming back for more.

So let me start by saying that your question - the one you're still waiting on an answer for - doesn't even have an answer because it's the wrong question. Nothing that the Anonymous OP wrote made a direct claim of any sort that could be proved by a Futuremark developer or anyone with inside knowledge of FM's processes.

The Anonymous OP's claim was, and I quote, "every thread on this topic is a cesspool of armchair engineers spewing uninformed speculations and spreading misinformation"

Are YOU a Futuremark developer? Do you have inside information? No? Yeah, I didn't think so. Do you know what that makes you? An armchair engineer spewing uninformed speculation and spreading misinformation.

OP had it right on the money.

Now, since you want to play this "still waiting for an answer" game, I'm still waiting on you to prove that Freesync doesn't do anything over 90 Hz. (And remember, you are claiming that FREESYNC doesn't do anything over 90 Hz - you are not claiming that ONE FREESYNC MONITOR has a Freesync range that ends at 90Hz.)

C'mon, now. Live up to your own standards that you expect everyone else to live up to.

July 21, 2016 | 12:57 AM - Posted by Anonymous (not verified)

The same could be said for every forum on most internet websites. What is lacking is the proper amount of online information to properly examine the issue. The GPU makers themselves are not very forthcoming in providing the proper amount of technical details about their GPU core’s execution resources in one easy to access location. Apparently there is more information available for a CPU core’s actual core execution resources from most of the CPU makers, but for GPUs and their cores the same information is hard to come by without a lot of intensive searching.

Hopefully this year's Hot Chips Symposium will offer up a little more information on AMD’s Zen, and Polaris micro-architectures, the same for Nvidia's Pascal, and new Tegra offerings. So what do you expect when the GPU makers are not in the habit of providing some good quality, non dumbed down, technical information on their GPUs. The games/gaming engine developers get a lot more information and access to technical manuals but that requires the signing of a NDA, so even the games/gaming engine developers have to be careful.

The games/gaming engine developers are commenting on both AMD’s and Nvidia’s newest GPU micro-architectures but there is a limit to what they can publicly say. It’s going to take a while before the academic institutions start to produce the graduate and doctoral papers that will provide a more in-depth look at the newest Nvidia and AMD GPU offerings, and the Vega/Greenland and Volta GPU micro-architectures will get a better going over by the server/HPC focused websites in 2017 when AMD’s supposed to reenter the server market with both Zen/Vega or Zen/Greenland and Zen CPU/APU server/HPC options, while Nvidia will have its Volta GPU accelerators in the market also.

Pair that lack of readily available very technical details from the GPU makers with the terrible miss-use of and defining of the proper computing sciences terminology, that is often times obfuscated behind marketing names(TM), things become very confusing. Additionally the lack of proper online computing dictionaries of the industry wide accepted computing sciences terms does not help in increasing any understanding of the terminology such as Asynchronous Compute, and other highly technical software and hardware usages of the term.

With a whole generation of script kiddies out there working at a software abstraction level(Script) that is the farthest level of abstraction from the real metal native mode assembly language what can anyone expect when it is the ones with the least exposure to computing science that are the ones writing for and running most of the websites devoted to GPUs/CPU on the consumer side of things. Add to that the marketing/MBA run technology companies that where once started by engineers and became successful, only to be taken public and taken over by the marketing/MBA types well that’s what really runs things into the ground for the PC industry that in now in a decline, with even the tablet/phone devices markets reaching a stagnant maturity.
Expect there be mostly wild speculation about any hardware CPU/GPU/FPGA and others going forward with the entire industry under the control of mostly the marketing/MBA types, look at Tim Cook of Apple an Industrial engineering(BS)/Bean counter and an MBA.

Good Luck getting anything but wild speculation unless you are prepared to go to a very expensive pay-walled publication. And the Benchmarks can all be gamed to spin things one way or the other.

July 21, 2016 | 04:50 AM - Posted by Anonymous (not verified)

Ok so in other words the standard for the benchmark is set to utilize Pascal fully, but after that it stops. No additional DX12 performance on AMD cards even though they are obviously better in this regard.

Same old song for AMD. They always get screwed in benchmarks, even if it is ever so slightly.
It never stops.

July 21, 2016 | 06:21 AM - Posted by Anonymous (not verified)

The standard for the benchmark i set to run generic code, not code written specifically for certain cards.
nVidia just happen to be better at that than AMD.

If AMD handled generic code better as well, they'd see smaller improvements from running AMD optimized code.

July 21, 2016 | 08:27 PM - Posted by No one (not verified)

Whether you like it or not, Futuremark's explanation that it has chosen to cater to the lowest common denominator because Nvidia didn't support the other 2 methods and the 2 vendors have made adamant that no vendor specific paths be implemented is bullshit.

They had another option, simply implement unbiasedly the full 3 methods as per microsoft's DX12's specifications and let the vendors support what they could, giving preempt boost only to Nvidia's hardware and full Async for AMD's hardware that can support it on the same neutral code path.

And this would've been the most neutral decision, we don't care, we go with the DX12 specifications, you sort it out on your hardware and support it or not, your problem.

So why didn't they go with this most unbiased option instead of the one biased in Nvidia's favor by denying the massive boost to AMD's hardware? There is only 1 logical answer to this, only 1 truth, but you can't handle the truth

July 21, 2016 | 12:05 PM - Posted by ltguy005 (not verified)

From watching the video its seems to me I want a company to replace futuremark/3dmark and make new benchmarks that shows us the best games can look if you fully utilized all the capability of the drivers(dx12) and the hardware. Im not interested in benchmarks that only show me features that game developers will find useful based on market share of the gpu makers and only on features that game developers will find worth their wild to implement. Benchmarks like 3dmark should push the envelope of visual detail and new driver and hardware features, not be a glimpse into how predicted games will perform. If I want to know how the card will perform in games, I will run in-game benchmarks. Benchmarks like 3dmark need to show me which card is truly better when all software and hardware features and optimizations are used to their fullest potential.

July 21, 2016 | 12:19 PM - Posted by Anonymous (not verified)

This is why I don't trust benchmarks now a day.

I used to think, those benchmarks stressed my hardware to the limited of it capabilities to see what's best or not, so I can use it for my judgment per purchased decision. It shouldn't matter which hardware I'm testing.

Benchmarks now a day should called as BENCHFAVORS!

July 21, 2016 | 07:54 PM - Posted by StephanS

Any of you believe in the report that nvidia is "cheating" to get better benchmarks by fudging texture upload ?

It seem nvidia cards dont use the full res texture (did ID or nvidia do this ??) and instead stream mipmaps over time

In contrast AMD cards seem to always deliver full res textures.

I think its not cheating if its a game/driver option. But if its done as a hack to get better frame time, we are comparing apple and oranges when doing performance benchmarks.

July 21, 2016 | 08:54 PM - Posted by Anonymous (not verified)

Yes and that's how Nvidia gets the FPS and gets the fools to buy into the whole line. But all games use mipmaps to a degree. Maybe there needs to be a benchmark that can measure frame by frame pixel quality and not only frame delivery variance and rate.

So maybe save all of the frames and run some analytical frame quality algorithms to see just which GPU is producing the best quality. There is plenty of software that can measure pixels at the sub-pixel level for certain levels of quality, even if the image has been blurred.

There will be more games moving some of the non graphics gaming compute workloads on the GPU to accelerate more than just graphics, so when those benchmarks arrive things will become clearer. Right now the VR games are in the early stage of optimization but as time passes there will be the necessary benchmarking software to measure not only graphics frame efficiencies but also compute workloads efficiencies on the GPU, to better rate the GPU for overall ability to take compute pressure of if the CPU’s cores. The VR gaming need for low latency will dictate that more non graphics gaming compute ability be necessary on the GPU’s cores going forward for 4k and VR gaming.

July 22, 2016 | 03:41 PM - Posted by Ha-Nocri (not verified)

Did PCPer have any article about Pascal latency issues? Like they jumped on the Polaris motherboard power draw?

July 22, 2016 | 08:23 PM - Posted by nobodyspecial (not verified)

I don't see the problem. AMD gets more ~2x perf improvement from turning on Async and clearly there is no effect on older NV tech...LOL. It would almost seem as though the poster has it backwards, and it more aligns to AMD's async ways. I mean shouldn't the benchmark be showing NV 12% async gain and AMD 5.8% for this to be true? It's comic that people think all dx12 games work faster on AMD (umm, tomb raider anyone?), and Vulkan will too...LOL.
1060 smashes 480 in dx12. Page 18 shows the same in 1440P (and is barely playable anyway here). Not all games will favor AMD in DX12 people. You can't judge an entire api or TWO api's in this case on <5 games, especially with AMD abandoning DX11 and only concentrating on dx12 since before win10 even hit due to lack of funding and consoles stealing R&D (people time with 30% less workforce in last few years, and MONEY with losses for ages).

Doom isn't even currently running the latest library with Pascal cards (forces 1.0.8 instead of either or even later version amd is using I doubt Id is being jerks here, it's just a mishap or whatever they aimed at first, or what they were using to develop with here first. If NV has read hardocp review (or the german site showing how bad their new cards are running), I'm sure they're working on it. In fact Bethesda's Vulkan faq admits they have an issue and are working on async for pascal for doom. It's comic people immediately jump to "those bastards are cheating". Also don't forget that in some cases, games just run faster on one tech or the other. IE project cars, WOW, Anno2025, AC Syndicate NV dominant. The reverse also happens, say COD black ops3 AMD (looking at 1060 vs. 480, and I could find a few more AMD or NV >15% wins probably - just an example from one site).

So basically the complainer is saying they should design for AMD use cases who owns 20% share, instead of NV who owns 80%? Ideally they don't PURPOSEFULLY design for either side, but what you're asking for would ONLY happen if AMD writes a check to make that happen on purpose. Wake me when Async matters anyway at 5-12% gains (I can OC to get that with a setting that affects ALL games). Meaning let me know when 99% (ok 50%?) of the games are made for dx12/vulkan instead of DX11. :( Don't forget this benchmark being discussed isn't even a GAME. LOL. Simple solution: Stop benchmarking crap stuff that isn't real to begin with! :) Can I play Time Spy? ROFL. NOPE.

IMHO the games used for benchmarking should reflect SALES. IE, if the game is selling under 100k, nobody cares anyway. WOW on the other hand selling millions repeatedly with every new expansion should always be included. When 7-10mil subscriptions are playing a game it's clearly worth benchmarking correct? If a game has such poor sales you could easily claim it's not worth the time to benchmark it. I mean who cares if nobody bought it? I also don't agree with turning off features that users of a product would NOT turn off. IE, hairworks. Who goes home with an NV card and turns it off in witcher 3? LOL. Maybe you turn it down to 4x-8x instead of 64 so it isn't so lopsided for testing, but turning it off is senseless when it is what the devs wanted to show and a feature add-in you get from vid cards (AMD cards and <=980 don't take a large hit until >8x tessellation so why not test ON?). Would you turn it off in COD Ghosts, or Farcry 4 when playing if your card is fine with a certain level of it on? NO.
COD fur looks excellent. I would possibly buy to get this if it keeps showing up in more stuff. IT works for AMD anyway, you just turn down the tessellation a bit same as you would for older NV cards.
Which wolf would you rather see in your game?
dynamic fur.
Hairworks changes far cry 4 quite a bit with so many animals running around. I want to see 160-315K strands of hair flopping around on animals. 1-2 days per animal shows a lot of devs will start using this type of tool.
7% difference on 980 turning on fur. When it's 130fps, I'll take that hit. Note the radeon 290x took the same hit. So this "hairworks is evil" crap is BS. Only if you turn levels up so high (witcher 3 x64 tessellation for example) that hairworks is really just showing off new gpus without much gain in the look in the game.
"What we are experiencing with fur seems to be very consistent across Maxwell, Kepler, and Hawaii GPUs.", said HARDOCP.

Ok then...TURN IT ON. Demonizing it is dumb. I'd say the same for any feature from either side until it gets ridiculous like witcher 3 at 64x levels.

July 23, 2016 | 03:08 AM - Posted by Anonymous (not verified)

Second paragraph: "1060 smashes 480 in DX12"

In the article you cited, at 1080p, 480 beats 1060 in 3 out of the 4 DX12 games tested. At 1440p, 480 beats 1060 in 3 out of the 4 DX12 games tested. The only game the 1060 wins in both instances is RotTR, which has historically always run better on Nvidia.

Somehow, to you, that means, "1060 smashes 480 in DX12"?


Third paragraph - you are either unaware that in OpenGL, Nvidia cards use 4.5 while AMD cards are forced back to 4.3, or you're perfectly aware and chose to omit that fact from your argument.

Fourth paragraph - by your logic, Futuremark should have severely limited the amount of tessellation used in FireStrike. After all, if they're going to limit the use of asynchronous compute methods because Nvidia can't handle them, then they should have limited the use of tessellation libraries because AMD couldn't handle them.

After that, the rest of your post is a word soup mishmash of hypocritical arguments that bounce between, "If your card can't handle them, turn them off," and, "Reviewers shouldn't turn those off! Who cares if it skews the comparisons, it skews them in my team's favor!"

p.s. Just in case your first instinct is to throw up a "fanboy" straw man, I betcha can't guess what graphics card (or cards) I use.

July 24, 2016 | 07:21 PM - Posted by Anonymous Nvidia User (not verified)

You're back. Probably a pair of "phantom" 980's. The tesselation in 3d mark Firestrike was much lower than compute shading.

In test 1 it was 500k tesselation versus 1.5 million compute (3x) more.

Test 2 must be it wrong. 240k to 8.1 million. Oops almost 34x as much.

July 25, 2016 | 01:20 PM - Posted by Anonymous (not verified)

Aww, look, the sad little Nvidia fanboy can't stand the idea of being slapped around by another Nvidia user, and so has to make-believe that I'm lying about my graphics cards. Classic sign of someone who knows perfectly well that he's full of shit but is completely unable to admit it to himself or anyone else.

Also a classic ad hominem fallacy. But you're all about arguments made up of fallacies.

By the way, nothing in your response addresses the one statement in my comment that you chose to respond to (while ignoring all of the rest because you knew perfectly well that you would lose.)

July 24, 2016 | 07:50 PM - Posted by Anonymous Nvidia User (not verified)

Totally agree with what you're saying. Physx should be used in benches as well. Can you find one? Not today. Physx is seen as worthless by AMD fanboys but both their consoles have it supported as. So even the software version was seen as worthwhile.

My question is this, if both consoles support it why is physx not included as well. Microsoft supports Physx so by extension it's a supported feature of dx12. AMD cards can utilize it but take a hit much like Nvidia's trying to run AMD's version of async. Except you don't get anything useful out of trying to run something you're not able to.

It's a crime that it's not included in a benchmark because gosh darn it games support and want it. Not every game wants or needs async which is supposedly a bitch to properly code for along with dx12 too.

Maybe futuremark should make a separate benchmark for each card. That way it is a fair version for everyone. You can rate your card against other Radeons and Nvidia would be only against other Nvidia cards. This is only meaningful way to compare cards. It's hard to make a truly unbiased test when the two things being compared are too different architecturally.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.