Retesting the 2990WX
Does gaming performance get a big uplift with the latest NVIDIA driver?
Earlier today, NVIDIA released version 399.24 of their GeForce drivers for Windows, citing Game Ready support for some newly released games including Shadow of the Tomb Raider, The Call of Duty: Black Ops 4 Blackout Beta, and Assetto Corsa Competizione early access.
While this in and of itself is a normal event, we shortly started to get some tips from readers about an interesting bug fix found in NVIDIA's release notes for this specific driver revision.
Specifically addressing performance differences between 16-core/32-thread processors and 32-core/64-thread processors, this patched issue immediately rang true of our experiences benchmarking the AMD Ryzen Threadripper 2990WX back in August, where we saw some games resulting in frames rates around 50% slower than the 16-core Threadripper 2950X.
This particular patch note lead us to update out Ryzen Threadripper 2990WX test platform to this latest NVIDIA driver release and see if there were any noticeable changes in performance.
The full testbed configuration is listed below:
Test System Setup | |
CPU |
AMD Ryzen Threadripper 2990WX |
Motherboard | ASUS ROG Zenith Extreme – BIOS 1304 |
Memory |
16GB Corsair Vengeance DDR4-3200 Operating at DDR4-2933 |
Storage | Corsair Neutron XTi 480 SSD |
Sound Card | On-board |
Graphics Card | NVIDIA GeForce GTX 1080 Ti 11GB |
Graphics Drivers | NVIDIA 398.26 and 399.24 |
Power Supply | Corsair RM1000x |
Operating System | Windows 10 Pro x64 RS4 (17134.165) |
Included at the end of this article are the full results from our entire suite of game benchmarks from our CPU testbed, but first, let's take a look at some of the games that provided particularly bad issues with the 2990WX previously.
The interesting data points for this testing are the 2990WX scores across both the driver revision we tested across every CPU, 398.26, as well as the results from the 1/4 core compatibility mode, and the Ryzen Threadripper 2950X. From the wording of the patch notes, we would expect gaming performance between the 16-core 2950X and the 32-core 2990WX to be very similar.
Grand Theft Auto V
GTA V was previously one of the worst offenders in our original 2990WX testing, with the frame rate almost halving compared to the 2950X.
However, with the newest GeForce driver update, we see this gap shrinking to around a 20% difference.
Assassin's Creed: Origins
While Assassin's Creed: Origins remained playable on the 2990WX with a frame rate of 58 FPS, there was still a significant performance gap of 43% between it and the 2950X.
With the new driver, this performance gap shrinks to 13% and brings the 2990WX in its default 32-core mode to the same performance levels that we saw in the 1/4 core legacy compatibility mode initially.
Total War: Warhammer II (DX11)
Total War: Warhammer II was far and away the biggest presenter of this performance issues, going from frame rates in the 70s-90s with every other processor we tested, down to average frame rates in the teens.
While we see a strong performance uptick of 25-35% with the new driver, the resultant frame rates still don't provide a playable experience.
F1 2017
Updating to the latest NVIDIA driver provides a massive 78% improvement for the Threadripper 2990WX, and closes the gap to just 7% between the 16-core and 32-core processors.
Middle-Earth: Shadow of War
Previously, Shadow of War showed high average frame rate scores but was still provided an almost 20% deficit when comparing the 2990WX to the 2950X.
With 399.24, this performance gap disappears entirely, providing a 30% performance increase when pairing a GTX 1080 Ti and the Threadripper 2990WX.
Far Cry 5
Far Cry 5 is an interesting title because, during our initial review of the 2990WX, we found that the game was incompatible with the 32-core processor and crashed on demand. However, be it due to a game patch or driver changes, it ran fine on our 2990WX-equipped system with both of these updates.
Despite the newly added compatibility, the 2990WX still sees half the frame rate in 32-core mode versus 1/4 core compatibility mode.
Overall, we are quite impressed with the performance increase this new GeForce driver provides over previous drivers when it comes to the 32-core Ryzen Threadripper 2990WX. Almost every title we tested saw some sort of frame rate increase, up to 78% for the most improved title, F1 2017.
However, there are still some caveats to recommending the Ryzen Threadripper 2990WX to content creators who are also interested in gaming. We saw some heavy stuttering while testing in a few titles, namely Assassin's Creed: Origins and F1 2017 that was not present while testing the same platform in 1/4 core mode. We've passed this data onto NVIDIA and hope they will continue to refine how their driver handles these extremely high-core count processors.
In general, the architectural decisions made, specifically in the memory department, that make the 2990WX a fantastic workstation processor do inherently still hold it back in gaming, when compared to other processors and even the Threadripper 2950X. For users who are going to run games anyway on this processor, it's nice to see that this new driver has eliminated the need to reboot into Legacy Compatibility mode for the vast majority of games.
To reiterate what AMD has been saying about the 2990WX from the beginning – this part isn't INTENDED for gamers. That's fine, we totally get it. But we like to provide the data that the audience is looking for and gaming results will at least be part of that.
Still, we are heartened to see such an improvement in game performance for the 2990WX, and these driver-level changes from NVIDIA should apply to other processors in the future like AMD's upcoming 24-core 2nd generation Threadripper processor and Intel's announced 28-core desktop offering.
Ryan's Note: This is an interesting shift in performance that lends itself to a deeper discussion at some point in the future. First, if this was indeed just a bug in the NVIDIA driver itself, the timing of it could not have been worse for AMD, as the window of time it existed is very closely matched with the release of the 2nd Generation Threadripper reviews. Second, this does make us alter our view of the Threadripper 2990WX. As Ken noted, the processor still has complications for gaming caused by its architecture but the picture is improved dramatically with these updated results.
It also reveals a shortcoming of our gaming testing, and the gaming testing of most of the online community – we only tested with a single graphics card. We use a GeForce GTX 1080 Ti for our CPU reviews because NVIDIA has been known to have better performance and better driver stability, but this situation has us a bit concerned. Best case is we double our workload and test a Radeon graphics card too. But then that can waterfall to different architectures (why not Turing AND Pascal, or Vega AND Polaris?), different driver versions, etc. It's a tough battle for us to face.
For AMD itself, this is a learning opportunity. The company said nothing to us to make us second guess our results, or push us to do Vega-based testing that would have shown the 2990WX in a more positive light. If the engineers were doing in-line testing for some longer period of time during product development, they would have seen this sudden drop in 2990WX gaming performance with a specific NVIDIA driver drop, and know to address it with NVIDIA or the media. Instead, it seems that everyone involved was in the dark. (Though AMD was leaning on the WX-line as a workstation only part, it surely was doing some kind of gaming testing.) Maybe NVIDIA knew about the bug – but rather than inform the community, it decided to let the reviews of the AMD Threadripper processors go out unfettered.
The whole situation has really been a mess.
This was something the Adored
This was something the Adored guy was talking about for a while. There is something wrong with Nvidia drivers and higher core counts. This deserves a much deeper dive.
Nvidia doesn’t have problems
Nvidia doesn’t have problems with i9 7980XE.
It’s NUMA gimmick fault.
Jerry and George sitting in
Jerry and George sitting in Jerry’s Apartment Kibitzing. Then suddenly there is a loud knock on the door!
Jerry gets up and opens the door, and Jerry Sees who it is!
HELLO NUMA!
.
.
.
But that NUMA gimmick does allow for AMD to get 80+ percent Die/Wafer yields on those Scalable Zen/Zeppelin DIEs and offer way MOAR CORES at a Much More more Affordable price than Intel. In Fact Even Intel’s High core count CPUs have come down in price in the face of that AMD Fat Annoying NUMA competition that’s needed for gaming workloads on TR/TR2 SKUs. Single Zen/Zeppelin DIE Ryzen 1/Ryzen 2 gaming SKUs are also available for purchase for those than only want to mostly game at a lower price than Intel charges.
AMD sells cheap because
AMD sells cheap because they’re aiming for market share, not because of some miraculous yields. It’s clear from just looking at their margins that their cost to produce isn’t anything spectacular compared to Intel.
you clearly don’t understand
you clearly don’t understand why AMD is cheaper at all.
intel makes a chip, if the chip can’t run their minimum spec, they trash it as a bad chip. so if 30% of the wafer is bad, and 20% are technically okay but don’t run min spec, they trash all 50%.
amd makes a chip, if the chip can’t run their minimum spec, they disabled parts and sell it as a lower tier chip. so you have 30% of the wafer is bad, but 20% are technically okay, and they disable parts that break them, and then sell them as a slower chip.
ryzen is two 4 core ccx that make the 8 core 16 thread. each version step down from that, the 6 core, the 4 core, are ALL that same 8 core 16 thread chip. AMD literally made ONE cpu. and they use as much of the wafer as they can, by disabling cores and threads which results in a cheaper end user price. INTEL does not do this, if a core is bad, the whole chip is scrapped, which results in higher prices for cpu’s. the bigger the chip, the less yield space you have for making it through said testing process. with the bigger chips, intel see’s shittier yields, like over 50% being bad chips from defects. instead of disabling shit and selling them as low tier, intel just trashes them. and thus the disparity in price. same goes for nvidia, they make a chip, if it doesn’t work, its trashed. they wont disable cores and sell them (even though it seems like it from how their tiers work) which is sad in my mind.
This is incorrect. Intel DOES
This is incorrect. Intel DOES in fact do the same thing as AMD. The quad core 8th gen i3s are the same die as the 6 core i7s. Same with the Xeons, fewer cores doesn’t necessarily mean “fewer cores” if you know what I mean. Intel didn’t design a different chip for each product. From 12 to 18 core CPUs those are all the same chip on their Xeon line.
They have more chips per generation than AMD currently has (Ryzen 2 line is all one chip. Intel has like 3 chips in the 8th gen with different specs), but they don’t discard mostly allright CPUs.
Both of these things can be
Both of these things can be true: AMD has fantastic yields + AMD wants marketshare and so is willing to sell at a low price.
Margins for either Intel or AMD aren’t so simple as you’re making them out to be. Both for instance do more than produce CPU’s, Intel in particular.
Intel is also able to command a huge price premium in the server market right now and that alone does quite a lot to bolster their margins too.
AMD has been gaining some server market share but even AMD has said they don’t expect to have over 5% of the x86 server market by year end.
No, it is not a problem of
No, it is not a problem of NUMA.
The user who initially reported the NVidia driver problem in 3DCenter forums (where it was picked up by golem.de and then spread to other sites) noticed it on dual-socket Xeon systems first. Once you restricted the number cores/threads available to the OS to 31/62 the problem would go away.
Yup. This is correct.
Its a
Yup. This is correct.
Its a high core/thread count issue not a NUMA issue.
NUMA does have issues all its own of course but for gaming it won’t be as big of a detriment as some think.
That a quick simple driver fix has resulted in such large performance improvements is the big tip off that this was just a bug.
The alternative of re-doing the whole driver to somehow support NUMA would be much much more time intensive and would also show some degree of scaling with core counts too. But that isn’t happening here at all.
Well the GPU drivers are
Well the GPU drivers are directly in the path to the recieving end of the draw calls so the GPU Drivers better damn well be scheduled in a NUMA aware fashon by the OS. And maybe the OS/drivers should be running on at least a Full Zen/Zeppelin “SOC DIE” and not out there on any Zen/Zeppelin “Compute DIE” that’s an Extra Infinity Fabric Hop away from the memory controllers on any of those TR2 Full Zen/Zeppelin SOC Dies. And really The OS should have its Driver/Services running on, at the very least, one of TR2’s Full Zen/Zeppelin SOC DIEs and the Game Software running on the other Full Zen/Zeppelin SOC die with Each TR2 Full SOC DIEs having their own complement of dual memory channels that is hanging off of the local CAKE attatched to the Local Zen/Zepplin DIE’s SDF plane(1).
[Soapbox ON]
That Adored TV guy does some nice work but he and the Rest of the Online Press needs to start using the proper AMD, and general computing sciences, naming/nomenclature or nobody will be able to properly discuss these highly technological subjects. So Zen/Zeppelin DIE, NUMA, Compute DIE(New AMD TR2 related Zen/Zepplin DIE term), SOC DIE(Has the both the memory Controllers and its own on die SouthBridge/NorthBridge enabled)! And processor core affinity and other related Technical Terminology needs to be learned and always used. That and what Intel describes as the Core and the UN-Core and other valuable Technical Terminology is needed also.
The Microprecessor and related Computing Industry is still light years behind on proper documentation and defining a proper Glossery of Realted Technology Terms/Terminology compared to the Old Mainframe Computer Folks of the Past. Those Old IBM/Burroughs/Sperry mainframes and their respective OS/Primer/Refrence manuals were, and still are, the high point in Tech Writing for Cogent Manuals that is still to this day unmatched. All of this Microsoft/Others Online knowledge base related “Content” is mostly usless drivel compared to the old mainframe industry and its quality of techncal of hardware, software, OS/Other documantation.
It sure is fun to watch the folks over at r/AMD and r/Nvidia and r/whatever-technology-related spin their wheels splitting hairs over things when a simple common computing sciences terminology dictionary would help move things beyond that total state of confusion category needed for a proper technology related discourse to even begin to occur.
[Soapbox OFF]
It’s not necessarily Nvidia’s Drivers as it’s more of Nvidia’s Drivers being OS scheduled to run on maybe some NON Full SOC Zen/Zeppelin DIE’s(Compute Only) cores instead of the proper Full SOC enabled Zen/Zeppelin DIEs(With Access to Lower Latency Near memory) available on TR2’s Unusual new design. More Tweaking will still need to be done owing to that very Unusual TR2 NUMA(Full SOC, Non Full SOC) Zen/Zeppelin Processor DIE Topology.
(1)
“Infinity Fabric (IF) – AMD”
https://en.wikichip.org/wiki/amd/infinity_fabric
Actually the NerdTech guy
Actually the NerdTech guy explained all of this before the AdoredTV guy. And more too, about how NV’s drivers work.
Vid here: https://www.youtube.com/watch?v=nIoZB-cnjc0
He caught & explained the early issue with Ryzen’s launch vs NV drivers.
Basically, NV’s drivers are really good but its full of hacks. Sometimes when it meets a new CPU architecture, it’s hacks hurt it.
Saying they’re full of hack
Saying they’re full of hack is probably being a little harsh as Nvidia chose to remove hardware scheduling, among other things, from their GPUs.
Theoretically it’s a good idea, if you can do something in software without it effecting the final result then it makes sense to use underutilized hardware to do so (in this case the CPU cores), if that results in being able to remove hardware the all the better (lower power and costs).
We’ve been doing the same thing but in reverse since the dawn of computers, do something in software until the costs (time wise) make doing it in hardware more efficient, encryption? done in software until dedicated hardware was added to CPUs, Ray Tracing? The same, even floating point operations were done in software at one point in time.
CPUs are after all general purpose processors.
Everyone’s GPU drivers are
Everyone’s GPU drivers are full of hacks though. Its not being harsh to say that at all. Its just the truth.
That is why they have to constantly update their drivers for new releases all the time and often do multiple releases to fix all the bugs for a given game. Often months after its first released. That is also why new drivers often end up breaking older games too.
Some of that is the game software developers to of course. But no one, not even Nvidia, does drivers “right” in the truest sense of the word.
Its all kludges and “GET IT OUT NOW DAMMIT” half assed fixes.
True and i wasn’t trying to
True and i wasn’t trying to say they’re all perfect, i was trying to say how moving the scheduler from hardware to software wasn’t a hack, it was a design choice.
Yes they’re full of ‘hacks’ in the terms of fixing or optimizing poorly implemented game code but the way Anonymous121 put it was that it was the ‘hacks’ that caused problems on some new hardware when IMO it was one of the drawbacks of using a software scheduler making a public appearance.
Oh OK missed that.
Yeah a
Oh OK missed that.
Yeah a software scheduler isn’t a hack at all if its implemented properly. Whether or not its better than a hardware scheduler is really a matter of implementation for the most part.
Thanks for posting this and
Thanks for posting this and getting some numbers from the new drivers. It is good to see Nvidia at least putting in some effort to extract more performance from AMD’s high end Big gun CPU. If I had this CPU I would most likely just turn off half the core and go with 16/32 setup when I was going to do some gaming and then turn on the whole chip for everything else.
If there was a way to do it in Windows without needing to reboot that would be a good thing and give people one less reason to complain about. As it turns out I will never be able to get one of these CPU’s so it is kind of a mute point for me at least. The most I will ever have is a Intel 8/8 core CPU setup or a AMD Ryzen
10/20 or 12/24 when the Zen 2 3000 series drops in the spring 2019.
Yep I am saying AMD will most likely release at least a 10/20 Ryzen 3700x or a 12/24 3800x. I say this because all you have to do is go look at their EYPC road map and look at the new higher core counts to know they are planning to go from 4 core modules to 6 core modules for Zen 2. So basically 6+6=12 or if they drop 2 cores and go with 5+5=10 for the desktop there would be still six core setup but if they only used 5 from each CCX that would give them a lot more room when defects happen in each CPU.
My thoughts are since Intel is about to release i9 to the desktop AMD will follow suit with the New Ryzens.
Ryzen 9 3900x 12/24
Ryzen 7 3800x 10/20
Ryzen 7 3700x 8/16
Of coarse there would be the non x version as well and of coarse the Ryzen 5’s and 3’s in the product stack. If I am right Do I win a cookie..lol
Rather than redesign their
Rather than redesign their entire CCXs I think they will just add another CCX on their chips. 3*4=12 cores.
Time will tell, but that’s what my money’s on.
They could absolutely go that
They could absolutely go that route but doing a 8 core CCX would help NUMA performance dramatically so I’d be hoping they go that route instead of just maintaining a 4 core CCX.
The rumor was that they
The rumor was that they specifically have a 12 core variant and a 16 core die variant; separate die, not salvaged for the 12 core part. This could obviously easily be done by making chips with 2, 3, or 4 CCXs while maintaining the 4 core complex. We don’t really know yet. They need to keep on pushing for software optimization for the NUMA-like architecture they are using, so I tend to think they will stay with 4 cores per CCX. That is 8 threads that can share data in a fine grained manner without penalty. It is a massive scalabilty issue to try and provide low latency access to all last level caches from all cores on the chip. Previously, intel actually had a similar design (at a high level) for their chips with a large number of cores. The largest core count chips used up to 3 different ring busses. Their current mesh network design does provide relatively low latency, but they achieved that partially by shrinking the size of the L3 significantly and increasing the size of the L2 significantly. Even with those optimizations, it still burns a huge amount of power sending data long distances across the chip at high speed. AMD runs the infinity fabric at memory clock partially because of power concerns. With their architecture, software optimization doesn’t just get you better performance, it may get you significantly better power consumption due to keeping more comunication local. The amount of power used in interconnect is huge, so sending data long distances across the chip is going to be a huge limitation to scaling to large numbers of cores. On AMD’s chips, optimizations can be done to take advantage of the clustered architecture to achieve better performance and/or power consumption. You can’t really do that with Intel’s mesh network. There is no prefered set of cores, so there is no way to optimize.
While 8 core CCXs would be interesting, they don’t want to do things that would cause software developers to drop optimizations for the basic architecture. With 8-core clusters, optimizations for games may focus on just using 8 cores of a single cluster rather than optimizations that allow scaling to multuple clusters. This why I have been thinking that they will stick with the 4-core CCXs. This isn’t really a normal NUMA issue. On a single die, access to memory is the same for all cores. It is just an issue with sharing memory between threads which is generally how thread to thread to thread communication is done. Two different threads accessing the same memory causes transfers back and forth between CCX caches and/or extra memory accesses. This communication is done at memory clock, mostly to save power, so it obviously is slower. Without the cross CCX accesses, the data would likely just be accessed from the local CCX cache heirarchy.
it would be wierd to still
it would be wierd to still have 4-core CCXs. If they disabled one core per CCX on a 3 x 4-core die, you would end up with a 9 core chip. For such a device, it would be 12, 9, or 6 core parts with deactivating 1, 2, or 3 cores.
“The company said nothing to
“The company said nothing to us to make us second guess our results, or push us to do Vega-based testing that would have shown the 2990WX in a more positive light. ”
What would have happened if AMD had come out blaming nvidia for poor perfomance of their cpu in gaming test? I think I posted something in one of the reviews that you are really testing the nvidia driver, not what the cpu is really capable of. Some of this stuff could go away with game engines built from the ground up for DX12 or Vulkan. DX11 has a major bottleneck in that all draw calls must be combined into a single thread for submission to the GPU. DX12 doesn’t have that limitation. Nvidia does have a multithreaded DX11 driver, but it it still subject to the single thread submission bottleneck. This will prevent it from scaling up to larger numbers of cores and it is a particulary bad workload for a cpu with multiple core clusters. There will be a lot of thread to thread communication to combine the draw calls for submission. In DX12, multiple threads can submit work to the gpu(s) directly. I would expect nvidia’s DX11 driver to have scaling issues on intel parts also, since the single thread bottleneck is still there. Nvidia still isn’t in any hurry to move to DX12 since wide adoption of DX12 will help AMD. It is the same situation as Intel with many core CPUs. We could have had 8 core mainstream parts at 20 nm, but there was no reason for them to release such a thing. They could hold the whole market back while making massive profits selling little 4 core cpus for high prices. It is in nvidia’s best interest to hold the technology back because moving forward would help the competition more than them. Same thing with CPU cores; excavator parts don’t look that bad (they were still behind in process tech) with multithreaded software. Software still isn’t theaded very well because the base line is still 2 to 4 cores even now. We could have had the baseline at 4 to 8 cores years ago. Intel holding back is part of why I have a 6 core cell phone and a 4 core desktop system.
Can you guys retest with Core
Can you guys retest with Core 0 unchecked (task manager affinity setting) on the games that show lower performance with all 64 threads
Level1techs found that turning core0 off boosted performance.
https://forum.level1techs.com/t/testing-the-second-gen-threadripper-cross-platform-a-little-more-closely/132174