Review Index:
Feedback

Ryzen Memory Latency's Impact on Weak 1080p Gaming

Author: Ryan Shrout
Subject: Processors
Manufacturer: Various

Timings, timings, timings

My next step was to increase memory timings on the Intel Core i7-7700K processor in order to raise the memory latency timings and then measure performance before and after.

I took the same Corsair DDR4 memory to 20-20-20-36-3T and re-ran our SiSoft Sandra and IMLC tests.

View Full Size

The increases aren’t massive, but I found that full random results were 13.8% slower, in-page results were 15.8% slower and sequential results were 7.5% slower. The Intel memory tool reported memory latency that was 13.9% slower.

Next I ran the same tests shown in our vTune measurements above several times, averaged results and normalized them to the slower memory speeds in order to see what impact the added memory latency showed on each workload.

View Full Size

The results are incredibly compelling. In all our general applications, excepting WinRAR again, we see less than 3% advantage despite having a ~13-15% lower memory latency. WinRAR was our exception in vTune on its memory latency sensitivity and that shows itself here as a 13% improvement going from the slower timings to the faster, in-line with the synthetic latency improvements. But all of the other non-gaming applications have memory access patterns that are well cached or easily prefetched.

The gaming tests show a more variable set of results, but clearly show advantages in the Core i7-7700K with tighter memory timings. Average frame rates increased by 5-15% apart from Deus Ex: Human Revolution and Ghost Recon Wildlands, which we will discuss later. One thing we did not see is a correlation between the percentage of presumed memory latency dependency based on the percentages shown in vTune and the results above with before/after memory timing results. In truth, gaming workloads are incredibly complex and it is difficult to narrow it down to any single attribute affecting performance. For example, I would point to the extremely high threading efficiency of Ashes of the Singularity as an example of application that is capable of hiding memory latency.

View Full Size

Ashes of the Singularity

Grand Theft Auto V cannot make that claim, with a thread level efficiency weighing heavily in the 3-4 range.

View Full Size

Grand Theft Auto V

Two games stand out from the graph above as having very little impact from the raised memory latency of our testing. Deus Ex and Ghost Recon Wildlands show just 1-2% change. What makes them, and the larger outliers like Hitman, so different? While I can’t explain the why in this case, we can point to another data point that backs up our memory latency assertion of this story.

View Full Size

This graph shows the same data as before, but adds in results from the Ryzen 7 1800X processor. I have also normalized to the faster Intel platform configuration. As you can see, in both Deus Ex: Human Revolution and Ghost Recon Wildlands, the performance delta between the default (faster) Intel 7700K configuration and the Ryzen 7 1800X is small. By comparison, games like GTA V, Hitman and Rise of the Tomb Raider show much wider gaps in 1080p gaming performance.

Far Cry Primal is an interesting data point that shows a massive gap between the Ryzen and Intel processors. It is a glaring example that shows we don’t know everything about these workloads or the impact of the memory system (latency or otherwise).

Ashes of the Singularity: Escalation is a unique example in our sample set that indicates the Ryzen 7 1800X result is faster than the artificially-slowed Intel Core i7-7700K (outside of margin of error differences like we see in Deus Ex). As this is the application that AMD placed on a pedestal as being “optimized for Ryzen”, it would make sense that the higher thread utilization is in fact able to hide any inherent memory latency disadvantage Ryzen holds compared to Kaby Lake.

If you are wondering why I did not include the performance comparisons between Intel and AMD configurations here, the core and thread count difference made it more difficult to make reliable conclusions around the import of the before/after deltas. That discussion dives more into SMT implementation and efficiency.

Another Interesting Data Point in a Complex Discussion

That is a lot of testing, profiling and wordsmithing to come to what conclusion?

By using Intel’s vTune application profiler, I am confident in saying that PC gaming workloads are more sensitive to memory latency that most other non-gaming workloads. If that seems a little broad, we can narrow it to say gaming is more sensitive than most of the non-gaming workloads that are utilized by reviewers and analysts to inspect, measure and gauge the performance of a processor and platform. That is an important conclusion, even if it might seem obvious in retrospect. Workloads like Handbrake, CineBench, and Blender are streaming applications, meaning there is very little thread to thread variance on the memory read patterns. Games tend to have a threading pattern that lends itself to “core hopping” and a more random access pattern. The net result is more dependency on the memory latency of a platform.

Because of this, I think it is fair to claim that some, if not most, of the 1080p gaming performance deficits we have seen with AMD Ryzen processors are a result of this particular memory system intricacy. You can combine memory latency with the thread-to-thread communication issue we discussed previously into one overall system level complication: the Zen memory system behaves differently than anything we have seen prior and it currently suffers in a couple of specific areas because of it.

Based on our previous work and testing, we see some alleviation of this problem by increasing memory frequency on Ryzen. I quickly ran the same synthetic memory tests on the 1800X at 2933 MHz to see what kind decrease that gets us.

View Full Size

Increasing the memory frequency results in an 11% faster full random test, a 10.2% faster in-page test, and an 11.5% faster sequential test. The IMLC app shows around a 9% advantage. However, that still leaves Ryzen at 3.2x disadvantage over the faster settings on the Intel Core i7-7700K, still at 2400 MHz, on the in-page result and 39% slower in the full random result. Increasing memory speeds on Ryzen definitely help AMD but Intel still has an edge for the foreseeable future.

Another avenue that can help remove the performance gap between Ryzen and Intel CPUs are more highly threaded and thread-aware game engines. Ashes of the Singularity is going to be the poster child for this going forward but I hope that Intel is working with the other major vendors (UE, Unity) to implement similar changes. The ability to thread your game gives it the ability to adapt to slightly higher memory latency without adversely affecting performance; effectively “hiding” it. But in truth this is a significant ask for developers that are already strapped for time and resources. As we have seen NVIDIA resort to, AMD will likely need to seed engineers on-location at these development houses to put Ryzen and the methods necessary to help it perform at its peak at the forefront of developer’s minds.

Obviously one way to remove the dependence on CPU memory latency is the raise the resolution and image quality settings of the games in question. Though this does not remove the memory latency sensitivity of the game workload itself, move the bottleneck more towards the GPU gives the CPU and the game threads more time to wait on memory accesses, effectively “hiding” the latency. How effective resolution increases are in removing the Intel/Ryzen performance gap is going to vary based on the specific engine and workload, but I have seen instances swinging in both directions in our testing thus far.

Narrowing down the issues on Ryzen also leaves me wondering what other workloads might also be impacted. I found one such example in WinRAR, a compression tool that is widely used and just so happens to have a strong dependence on memory latency. While the team is still exploring, it is possible that AMD will want to address this concern with the coming Naples platform launch and the enterprise workloads that may or may not behave very different than the consumer testing we focused on today. AMD needs its push into servers to be a success and any red flags there are going to be just as important in the consumer space.

Even though we can’t make 100% assurances that our testing has solved the “Ryzen 1080p dilemma”, I feel confident we have made some significant strides. Memory latency is clearly an important factor for the current state of gaming on Ryzen, even if it mainly exposed at lower resolutions. How AMD is able to work around it, through future architecture revisions and with game and application development initiatives will be judged going forward.


May 16, 2017 | 01:15 AM - Posted by khanmein

Interesting..

May 16, 2017 | 02:05 AM - Posted by pdjblum

maybe the architecture's ability to scale so well was seen as being more advantageous than latency

not sure it is "hiding" as opposed to taking advantage of its strengths, which are clearly about the future of software, or so it seems to me

goes back to what we already knew: ryzen gaming performance gets better with thread counts, just like sheets

it is all about trade-offs, and making the right ones looking ahead

what i get from this piece is a very subtle, or not so subtle, attempt to disparage yet another new amd part, but that could be from my strong desire to see amd flourish for the benefit of all concerned

in any case, no matter how you want to frame it, ryzen is an amazing achievement and a great cpu and will only get better

May 16, 2017 | 02:32 AM - Posted by odizzido

yeah if I were to buy a CPU today it would be the 1600X. It's really quite good.

May 16, 2017 | 08:09 AM - Posted by Ryan Shrout

"Hiding latency" is not meant as a criticism to AMD. It is a standard industry term to reflect the idea of using computing and threading to minimize the negative effects of memory interfaces.

May 16, 2017 | 09:53 AM - Posted by kenjo

Most things that makes a CPU complicated is involved in trying to hide latency. The whole point of a cache is to hide latency it's basically the only reason it's there.

May 16, 2017 | 02:30 AM - Posted by odizzido

Interesting article. I am still fairly new to this site, but you guys have put out some solid ryzen coverage.

One thing I would like to request...when vega releases and you do a review of it, could you also test AMD+intel CPUs when comparing vega to whatever cards you choose from nvidia?

The reason why I would like to see this is because I've seen a few cases where Nvidia cards just don't work well on ryzen. Check this out for example, and look at the difference between the 1060 and the 480

http://www.anandtech.com/show/11244/the-amd-ryzen-5-1600x-vs-core-i5-rev...

I have also seen this behavior reported in rise of the tomb raider. I would really like to see an article which investigates this to see if it's an actual problem or just a few edge cases.

May 16, 2017 | 08:10 AM - Posted by Ryan Shrout

Yeah, it's something we are considering. Especially in light of the Ryzen launch.

May 16, 2017 | 04:08 AM - Posted by ltkAlpha

Good read! I would've appreciated even some speculation on where the difference in memory latency/performance between platforms comes from. Is it easily remedied by, say, increasing the frequency of the dedicated silicon in the next generation or is it more complicated than that?

May 16, 2017 | 08:11 AM - Posted by Ryan Shrout

Good point.

I would expect some improvements in the fabric to come in the second generation. If possible, something as simple as increasing the clock rate at which it runs (currently half of memory speed) would help.

May 16, 2017 | 09:59 AM - Posted by kenjo

Latency is the hard problem to solve. much much easier to get good bandwidth.

May 16, 2017 | 10:04 AM - Posted by Jtaylor1986

Your line graph could use a little explanation or a caption. I get it now after staring at it for 2 minutes and it is an excellent piece of data but unless you already knew that the processor is a zeppelin die it wouldn't make any sense.

May 16, 2017 | 10:43 AM - Posted by Jtaylor1986

It's starting to become painfully obvious how much outdated game engines are holding back progress in the CPU/GPU industry. It's almost pathetic that a tiny private company (Stardock) can create a core neutral engine that takes great advantage of explicit API's and includes multi-gpu support while big name AAA publishers are still screwing around with overhauled DX9 engines that have been dragged into the DX11 era.

May 16, 2017 | 12:36 PM - Posted by StephanS

Could one test have been done with more data point at 2400 ?
Like see the impact of CAS 10 to 18 in one game sensitive to latency (like hitman)

Also AMD said they reduce latency by 6ns in the last microcode update... did that have ANY impact on performance ?

Side note: I have yet to see other sites do in depth analysis like this. Vtune can indeed tell you a lot.

I even wonder if some of the game developers even know this tool exist...

May 16, 2017 | 01:29 PM - Posted by JohnGR

@Ryan
last page
Ashes of the Singularity is going to be the poster child for this going forward but I hope that Intel is working with the other major vendors (UE, Unity) to implement similar changes.

Intel>>>AMD?

May 17, 2017 | 04:57 PM - Posted by Ryan Shrout

Whoops, yes!

May 17, 2017 | 07:30 AM - Posted by Jann5s

@ Ryan, was there any correlation between average fps and latency dependency between the various games?

To me, latency issues are very dependent on frame rates. Lower frame rates allow for much more latency hiding. In other words, when the benchmarks run at very high fps, then performance should be more sensitive to latency.

This also brings me to the question of reasonable fps. Even if a system can run 200fps in a benchmark, is it still a reasonable benchmark? Most users will not run a game at 200fps, but instead will increase the quality of the game (twitch fps shooters excluded). And thus, a wider architecture with more latency may actually be a better choice, given it can hide its latence in a 60-100fps scenario.

In short, I'm not sure, but I'm not confident with conclusions based on very high fps benchmarks.

May 19, 2017 | 12:48 PM - Posted by Rocky12345

Very nice write up on this Thank You.

I have played a lot with my own system settings and memory timings to try to eek out every bit of FPS I can form my system.
I am running on a i7 2700K OC'd to it's max and my memory is topped out at 2133Mhz the max the Sandy's support. BY using FSB to advance the memory a bit I got it at DDR3 2200Mhz. I played with timings for a long time until I got 100% stable and max bandwidth I could get from this system. Aida64 memory tests show my latency around 42.6 now and my bandwidth about 32-35GB's between the read write and copy tests.

With my system settings CPU above 5Ghz and the memory settings I have found that the games run very smooth I do not notice stutters except for poorly code games. So you are 100% right games do like as low of latency as possible on the memory subsystem.

The gains in min & Avg is well worth the extra effort of tuning the system. Now if AMD can tune their Ryzen's CPU's with firmware bios updates I see then getting some FPS gains as well but I would not expect them gen of Ryzen to match Intel's kady-lakes in the FPS department @ 1080p Ryzen also has a large Clock rate challenge as well when pitted up against the 7700K CPU.

May 30, 2017 | 11:18 AM - Posted by Dark_wizzie

Very curious about memory sensitivity on CPU bound games.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.