Feedback

Ashes of the Singularity Gets Ryzen Performance Update

Author:
Subject: Processors
Manufacturer: AMD

Tweaks for days

It seems like it’s been months since AMD launched Ryzen, its first new processor architecture in about a decade, when in fact we are only four weeks removed. One of the few concerns about the Ryzen processors centered on its performance in some gaming performance results, particularly in common resolutions like 1080p. While I was far from the only person to notice these concerns, our gaming tests clearly showed a gap between the Ryzen 7 1800X and the Intel Core i7-7700K and 6900K processors in Civilization 6, Hitman and Rise of the Tomb Raider.

View Full Size

A graph from our Ryzen launch coverage...

We had been working with AMD for a couple of weeks on the Ryzen launch and fed back our results with questions in the week before launch. On March 2nd, AMD’s CVP of Marketing John Taylor gave us a prepared statement that acknowledged the issue but promised changes come in form for game engine updates. These software updates would need to be implemented by the game developers themselves in order to take advantage of the unique and more complex core designs of the Zen architecture. We had quotes from the developers of Ashes of the Singularity as well as the Total War series to back it up.

And while statements promising change are nice, it really takes some proof to get the often skeptical tech media and tech enthusiasts to believe that change can actually happen. Today AMD is showing its first result.

The result of 400 developer hours of work, the Nitrous Engine powering Ashes of the Singularity received an update today to version 26118 that integrates updates to threading to better balance the performance across Ryzen 7’s 8 cores and 16 threads. I was able to do some early testing on the new revision, as well as with the previous retail shipping version (25624) to see what kind of improvements the patch brings with it.

Stardock / Oxide CEO Brad Wardell had this to say in a press release:

“I’ve always been vocal about taking advantage of every ounce of performance the PC has to offer. That’s why I’m a strong proponent of DirectX 12 and Vulkan® because of the way these APIs allow us to access multiple CPU cores, and that’s why the AMD Ryzen processor has so much potential,” said Stardock and Oxide CEO Brad Wardell. “As good as AMD Ryzen is right now – and it’s remarkably fast – we’ve already seen that we can tweak games like Ashes of the Singularity to take even more advantage of its impressive core count and processing power. AMD Ryzen brings resources to the table that will change what people will come to expect from a PC gaming experience.”

Our testing setup is in line with our previous CPU performance stories.

Test System Setup
CPU AMD Ryzen 7 1800X
Intel Core i7-6900K
Motherboard ASUS Crosshair VI Hero (Ryzen)
ASUS X99-Deluxe II (Broadwell-E)
Memory 16GB DDR4-2400
Storage Corsair Force GS 240 SSD
Sound Card On-board
Graphics Card NVIDIA GeForce GTX 1080 8GB
Graphics Drivers NVIDIA 378.49
Power Supply Corsair HX1000
Operating System Windows 10 Pro x64

I was using the latest BIOS for our ASUS Crosshair VI Hero motherboard (1002) and upgraded to some Geil RGB (!!) memory capable of running at 3200 MHz on this board with a single BIOS setting adjustment. All of my tests were done at 1080p in order to return to the pain point that AMD was dealing with on launch day.

Let’s see the results.

View Full Size

View Full Size

These are substantial performance improvements with the new engine code! At both 2400 MHz and 3200 MHz memory speeds, and at both High and Extreme presets in the game (all running in DX12 for what that’s worth), the gaming performance on the GPU-centric is improved. At the High preset (which is the setting that AMD used in its performance data for the press release), we see a 31% jump in performance when running at the higher memory speed and a 22% improvement with the lower speed memory. Even when running at the more GPU-bottlenecked state of the Extreme preset, that performance improvement for the Ryzen processors with the latest Ashes patch is 17-20%!

View Full Size

It’s also important to note that Intel performance is unaffected – either for the better or worse. Whatever work Oxide did to improve the engine for AMD’s Ryzen processors had NO impact on the Core processors, which is interesting to say the least. The cynic in me would believe there is little chance that any agnostic changes to code would raise Intel’s multi-core performance at least a little bit.

So what exactly is happening to the engine with v26118? I haven’t had a chance to have an in-depth conversation with anyone at AMD or Oxide yet on the subject, but at a high level, I was told that this is what happens when instructions and sequences are analyzed for an architecture specifically. “For basically 5 years”, I was told, Oxide and other developers have dedicated their time to “instruction traces and analysis to maximize Intel performance” which helps to eliminate poor instruction setup. After spending some time with Ryzen and the necessary debug tools (and some AMD engineers), they were able to improve performance on Ryzen without adversely affecting Intel parts.

View Full Size

Core to core latency testing on Ryzen 7 1800X

I am hoping to get more specific detail in the coming days, but it would seem very likely that Oxide was able to properly handle the more complex core to core communication systems on Ryzen and its CCX implementation. We demonstrated early this month how thread to thread communication across core complexes causes substantially latency penalties, and that a developer that intelligently manages threads that have dependencies on the core complex can improve overall performance. I would expect this is at least part of the solution Oxide was able to integrate (and would also explain why Intel parts are unaffected).

What is important now is that AMD takes this momentum with Ashes of the Singularity and actually does something with it. Many of you will recognize Ashes as the flagship title for Mantle when AMD made that move to change the programming habits and models for developers, and though Mantle would eventually become Vulkan and drive DX12 development, it did not foretell an overall shift as it hoped to. Can AMD and its developer relations team continue to make the case that spending time and money (which is what 400 developer hours equates to) to make specific performance enhancements for Ryzen processors is in the best interest of everyone? We’ll soon find out.


March 29, 2017 | 10:35 PM - Posted by Anonymous (not verified)

400 hours... So all it took was maybe a team of 3 people less than 4 weeks of work to be the first to better code a complete game engine for up to 30%+ more performance?

I'm impressed.

March 30, 2017 | 03:01 AM - Posted by pdjblum

are you assuming 40 hr work weeks?

most game developers working on big budget titles I imagine work conservatively 100 hr wks

given that, conservatively, four guys could have got it done in a week

ryzen is awesome and will only keep getting better, just as the 480 has

and the value proposition is crazy for both with recent prices for the 480

March 30, 2017 | 03:48 AM - Posted by JohnGR

Even if we count weekends as working days that 100/7 is about 14 hours per day. If we give those poor people one day to relax that translates to 100/6 = about 17 hours per day. If we assume that they get the whole weekend free, then this is 20 hours per day. The worst case scenario should be about 60 hours per week if they have to do something really important, with a typical 30 to 40 hours per week when they don't have to do something in a limited time.

March 30, 2017 | 03:59 AM - Posted by Anonymous (not verified)

Lol no one works more than 60 hours a week anywhere in Europe, it's not even legal.

The typical work week is 32 to 40 hours here.

What third world country do you live in?

March 30, 2017 | 04:12 AM - Posted by Anonymous (not verified)

That's not true.

I live in the UK, when I go offshore I will work 84 hour weeks for 3 weeks straight. That's 12 hour days for 3 weeks and no break.

The answer to your inevitable question is "because it makes me rich".

March 30, 2017 | 06:41 AM - Posted by Anonymous (not verified)

Enjoy having no life to enjoy your money

March 30, 2017 | 09:00 AM - Posted by Anonymous (not verified)

LOL SALTY MUCH? He works a whole 20 hours more than the avg person, probably makes a hell of a lot more, and has a whole week off

March 30, 2017 | 10:54 AM - Posted by Anonymous (not verified)

And he makes a lot more stress too, he'll drop dead before retiring.

April 1, 2017 | 10:54 PM - Posted by Brogen

What makes you think the man hates his job? I technically work 60+ hours a week and feel like I do nothing. I like my job, so the hours don't even matter.

March 30, 2017 | 09:41 AM - Posted by Anonymous (not verified)

Since when the UK is in Europe?

March 30, 2017 | 10:31 AM - Posted by Anonymous (not verified)

about the same time the United "Kingdom" stopped being an actual : "Kingdom" :P

March 30, 2017 | 01:36 PM - Posted by Jeremy Hellstrom

well, Brexit isn't complete yet is it?

March 31, 2017 | 04:40 AM - Posted by JohnGR

I believe(personal opinion) it is, for three reasons.
First, Great Britain is not Greece to take a referendum and translate it to the opposite result. They will try to honor the final result.
Second, GB was or at least looked, always closer to the US than Europe.
Third. Pride. They would never accept to stay in a Europe where Germany commands. The palace will never accept the Queen to be second to some European president in a suit.

April 2, 2017 | 07:54 AM - Posted by Mordac (not verified)

You understand that EU was created to curb the power of Germany? The UK leaving will do the opposite.

April 6, 2017 | 11:16 AM - Posted by Stefem (not verified)

Man, Greeks voted to stay in EU and Europe was considered to be a continent last time I checked...

March 31, 2017 | 06:47 PM - Posted by Robert Grainger (not verified)

You are aware Europe is a continent, and the EU is a political entity. The UK will always be, and has always been in Europe, its part of the same continental shelf, nothing humans can do will ever change simple geography.

March 30, 2017 | 04:19 AM - Posted by Metwinge (not verified)

I work a minimum of 50 hours P/W here in the south east of England but we did have to sign a clause taking us out of the EU working directive or we wouldn't be allowed to work more than 45 hours P/W

March 30, 2017 | 05:23 AM - Posted by Anonymous (not verified)

You should ask some selfemployed people about their working time.
Also it is legal to work more, just for a short time not for the whole year.

Sometimes I work for special projects with 12 days 10 hours per day, then 2 days off and another 12 days of work.

March 30, 2017 | 11:40 AM - Posted by Anonymous (not verified)

Highly payed and demanding jobs in EU do make you work more in many cases, but it is indirect.

you have a contract which specifies your working hours- these are the normal approximately 8 hours. pr. day. But if you want to go to the top - your job is your "hobby" so to speak as well - and the rest are "interest hours" - you don't get payed more - but you may get a bonus if you hit your targets - whatever they are.

I work in a demanding environment - when my targets are good and everything is good and I have everything under control - I work less than the contract specified and that is just fine - but on average I work 220 hours a month - and I'm happy with it - but my fixed salary is high enough that even though I would divide it with that many hours worked - it's still well above average. And I still enjoy my job.

March 30, 2017 | 07:44 AM - Posted by Mike S. (not verified)

I work as a software developer, and maybe me and everyone else I've ever worked with is a moron, but our code quality rapidly approaches "a toddler could do better" levels if we go past 50 hours in a single week. It doesn't matter how much caffeine they shove into me.

I know the game industry has that kind of reputation, but I think it must be overstated. Or they're giving their developers amphetamines.

March 30, 2017 | 06:35 PM - Posted by Anonymous (not verified)

I can testify to this. Most software development companies DO NOT want you to work overtime much, if any at all. I'd go as far as to say that most software developers don't spend half their time actually developing. Bill Gates (I think) once said "If I can get 100 good lines of code out of each developer per year, I'm very pleased." Good code meaning code that last 3 years or more. (which is typically 2 development cycles.)

April 1, 2017 | 03:53 AM - Posted by Mark_Hughes

Also a developer, I work 37.5 hours a week, So do all the other devs in our company as far as I know. We do sometimes get asked to work over (quite rare) but once the work is done we can reclaim the time by having a shorter week later on.

I can't imagine any good code comes out of hours number 50 and onwards.

March 30, 2017 | 06:30 PM - Posted by Anonymous (not verified)

No one works 100 hours a week. A typical work week (anywhere in the world) is around 36 hours. The USA has a 40 hour work week limit before overtime is required. You can not possibly work 100 hours in a week anyways. Its literally impossible. I'd advise that most developers get to actually develop around 20-25 hours a week, the rest of the time is spent in meetings, emails/communications, and so on.

Additionally, most "teams" are going to be around 8-10 people, with only 4-5 being developers. So, by math, lets assume 5 developers at 20 hours per week... We're looking at 4 weeks to complete that development. I mean, that's not terrible. And for a game to integrate it at deployment is probably even less intensive on the pocketbook. I'd say I give this a B- grade. Its good, and certainly above average. But, 400 hours is nothing to sneeze at. The average developer makes around $35/hour or more, so that can easily be around $25,000 after you factor in things like health insurance and other benefits. But, before you factor in other roles like BA's, SCRUM Masters, Human Resources, Managers, and all of their salaries into it... Not including you have CI cost, and other operating cost. This easily could be around $60,000-$70,000 for this one project.

Which to be fair really isn't that bad. I mean in the company I work for, there is a budget of almost $1,000,000 just to upgrade 1 program. And we're looking to spend almost 6 months doing it, with about a 7 person team. (I say about because they're considering hiring another developer.)

March 30, 2017 | 07:51 PM - Posted by Anonymous (not verified)

"Nobody works 100 hours a week, it's impossible"

You've never been near a farm have you? Go tell a farmhand it's impossible and watch as he dies laughing.

April 2, 2017 | 08:41 PM - Posted by Anonymous (not verified)

I worked 16 hours per day for 12 days, then 2 days off.

April 2, 2017 | 08:42 PM - Posted by Anonymous (not verified)

(for the above, that was DAIRY FARM and that was also seasonal)

March 30, 2017 | 10:16 PM - Posted by Anonymous (not verified)

Big name studios usually have limits on workable hours to prevent that.

March 31, 2017 | 07:32 PM - Posted by Anonymous (not verified)

No, we do not work 100 hr weeks.

I clock 40 hour weeks most of the time, with the occasional crunch 50-70 hour week (rare) around important deadlines and ship times.

March 30, 2017 | 11:01 AM - Posted by Deshi (not verified)

More than likely the code changes didn't effect Intels processors because it is Ryzen specific optimizations that are on a different code path. The code probably runs something like:

if [[ proctype == Ryzen ]]; then

{run ryzen optimizations}

elif [[ proctype == intel ]]; then

{run old code }

fi

March 29, 2017 | 10:35 PM - Posted by Anonymous (not verified)

400 hours... So all it took was maybe a team of 3 people less than 4 weeks of work to be the first to better code a complete game engine for up to 30%+ more performance?

I'm impressed.

March 29, 2017 | 10:47 PM - Posted by Corrigan (not verified)

From what I've seen of the AMD Ryzen optimization GDC talk slides, almost all of the changes are minor tweaks to various lines involving cores.

March 29, 2017 | 11:20 PM - Posted by tts (not verified)

Yup.

Look at that chart showing core to core latency. Cross CCX transfers or seeks are eeeexxxxppppeeennnsssiiiive latency-wise.

So much so I can't help but wonder if AMD kind've screwed up there.

Yeah its possible to work around it but man expecting the software guys to do that is risky to say the least for a company that has a very small market share like AMD does right now.

They'd be foolish if they don't address that issue in Zen+.

March 29, 2017 | 11:51 PM - Posted by Anonymous (not verified)

You're thinking too small. This design makes it easier to scale up the core count per socket (also no expensive ring bus that is unable to scale nearly as well). For enterprise, that will be a big deal.

March 30, 2017 | 03:05 AM - Posted by tts (not verified)

Scaling the cores per socket doesn't matter if it doesn't perform well.

Also there is no actual indication that it helps with per socket core scaling. AMD had no issues putting tons of BD cores in 1 socket either. And neither does Intel.

March 30, 2017 | 04:10 AM - Posted by Anonymous (not verified)

Main issue with your logic here is that you are thinking as a gaming consumer.

The chips are surprisingly good, even at gaming, and the latency are barely noticable in scores of various benchmarks + encoding.

Ryzen is a winner, just the price tells the story detailed enough.

March 30, 2017 | 04:31 AM - Posted by tts (not verified)

They're consumer oriented chips targeted at gamers though. The server Zen's aren't even out yet.

Yes Ryzen is a good chip with great cost vs performance ratio but that doesn't mean AMD did everything perfect in implementing it nor are they somehow immune from criticism or scrutiny.

Even Intel makes boo boos sometimes and I'll criticize those too.

March 30, 2017 | 06:19 AM - Posted by bobtheblob (not verified)

I wouldn't call the CCX design a 'boo boo'. It's a conscious design choice by AMD. The benefits of this scalability are clear for server systems which is where most of the profit margins are these days. The CCX design is what allows AMD to go easily from 8 core to 16/24/32 cores on the upcoming Naples chips. And for server workloads the CCX latency will make no real difference.

The design is not a mistake, they decided not to fight intel directly, and instead made a design decision which has clear advantages in scalability but disadvantaged in unoptimised gaming

March 30, 2017 | 07:27 AM - Posted by tts (not verified)

@ bobtheblob

If its causing performance issues with common software then its definitely a boo boo to put it mildly.

Its also isn't clear what the CCX does to somehow improve scalability for high core count parts either. Again Intel and previous AMD Bulldozer variants scale up just fine to high core counts.

AMD had 16 core Piledriver chips back in 2012 for instance. Heat and power were the biggest factors limiting higher core count parts from either Intel or AMD, not inter cache bandwidth or latency.

March 30, 2017 | 09:50 AM - Posted by Anonymous (not verified)

Let me try to reformulate what bob's point:
AMD had a low R&D budget, at least compared to Intel. They wanted to take back market share in as many markets as possible.
Their strategy is: make one design we can just take to the fab and (almost) say "2 per chip on this wafer, 8 per chip on that one one", with minimal changes. So they come up with CCX, the lego block of cpu design.
Yes, it has it's flaws compared to a chip that is designed from the ground up for that core configuration, but that's a concious trade-off to be able to use it in every single market, and the performance hit isn't that bad if you consider the internal latency for cores on the same CCX is lower.

Sorry for the wall of text.

March 31, 2017 | 01:35 AM - Posted by Anonymous (not verified)

Intel does the same thing (essentially) with their 14 to 18 core haswell-ep parts. They have two separate ring busses. This is similar to when the Opteron processors came out. Latency to local memory was low and bandwidth scaled with the number of sockets. Intel still had the processor connected to a chipset with the memory controller on the chipset. It didn't scale well. Software had to be modified to be NUMA aware the Opteron systems. AMD captured something like 25% of the server market before Intel followed suit and built a distributed memory system that looked almost the same. Software will need to be modified to take advantage of these architectures; you probably won't find too many people gaming with a 14 to 18 core Haswell-ep Xeon though. AMD does have millions of consoles with the dual CCX architecture though, so there should be a lot of people who know how to optimize for it. The hardware is always ahead of the software.

April 1, 2017 | 08:54 AM - Posted by tts (not verified)

AMD supposedly isn't using a ring bus for the CCX though and Intel's ring bus doesn't seem to introduce any latency issues between caches unlike AMD's CCX. If anything look at the diagrams so far that we have for the CCX bus arrangement it reminds me strongly of how the HT bus was laid for a quad socket Opteron.

https://www.techpowerup.com/230702/amd-ryzen-die-shot-and-new-architectu...

The original Opteron inter CPU bus was nothing like a on die ring bus for connecting caches or CPU's. It was point to point Hypertransport bus that was inter socket and was also flexible in how it could be implemented exactly.

From what I remember it was possible to do a 8, or more, socket Opteron if you wanted to but the bandwidth would be too low and the latency too high to make it a sensible option for most work loads so 4 sockets was usually the max for Opterons.

Here is what one of the 8 socket boards looked like, not many ever sold outside of the HPC market so they're rare to see today: http://www.amdboard.com/iwill_h8502_8way.html

Yes AMD used ccNUMA back then too of some sort but it was a very different market and the Zen arch is quite different too as is Intel's position and architectures. History isn't going to go repeating itself.

And Jaguar variant in the PS4 and XB1 doesn't use the CCX in Zen either. Its using something else and its working through the L2 caches and not a L3.

http://www.redgamingtech.com/ps4-architecture-naughty-dog-sinfo-analysis...

The latency is also quite high, 190 cycles to access a non-local L2 cache doesn't sound good (main RAM is 220 cycles for comparison according to that article, caches should be WAY faster than main RAM...) but its a console and programming around weird hardware quirks is the norm so that is acceptable.

April 2, 2017 | 03:47 AM - Posted by Anonymous (not verified)

The console processors are not the same as Zen, but they still essentially have two separate cache zones, and there are cost to migrating threads across them or accessing data across them. Also, Xbox Scorpio may be a Zen variant.

Intel's L3 cache access all go through the ring bus. The caches have stops on the ring bus and so do the CPUs. The access to the cache slice closest to the CPU is fastest. With an 8 core chip, the worst case is only maybe 5 hops or so since data can go in either direction. The QPI links, the IO controller, and main memory have stops on the ring bus also. The cache is not connected directly to the cores, it still has to hit the ring bus. For the 14 to 18 core parts, they have two separate ring busses, and therefore two separate cache zones, just like Ryzen. You aren't going to see that large of latency differences on Intel chips that only have a single ring bus. You probably wouldn't see much of any latency difference on a 4 core part where the maximum number of hops for an L3 access is only 3 or so. For larger Intel chips (which no one uses for gaming since they are ridiculously expensive), a possible optimization would be to keep threads with shared data as close together as possible with regard to ring bus stops. It may not make much difference though. The ring bus operates at core clock, which is good for performance but probably quite bad for power consumption. Interconnect is consuming a huge amount of power in modern processors. The ring bus has long interconnect that has to be driven at core clock. AMD may have an advantage there with lower clock and high speed connections remaining localized.

Ryzen has the L3 cache tightly coupled to the 4 cores in the same complex. In pcper's testing, the core to core latency within the CCX is about half of Intel's latency on their 8 core part. Going between CCXs is higher than intel's latency though. The "ping" time may not be that representative. I suspect the lower performance with unoptimized code is due to separate L3 caches. Threads often pass data between them by using shared memory. If both threads are in the same CCX, those memory addresses would be cached in that CCX's L3. If the threads are on different CCX's, then it will cause a lot cache coherency traffic. If one thread writes some data and releases the lock, the other thread will have to wait for the dirty cache blocks to be copied from one cache to the other before it can continue. You would hit the same effect on Intel dual ring bus parts. Those 14 to 18 core Intel parts are ridiculously expensive though, so few of them would ever be used to play games.

Given the advantages and the relatively small amount of optimization required, I wouldn't be surprised to see Intel move to a similar architecture in the future just like they did after Opteron came out. At a high level, Intel's system architecture for multi-socket systems (using QPI links instead of HyperTransport links) is nearly identical to the original Opteron system architecture. A 4 or 8 socket Intel board would be ridiculously large and also wouldn't scale well except for applications that can scale well on NUMA architectures. More than 2 socket systems were always a very small market due to the ridiculous board size. The current preferred way to build larger than dual socket systems is to use something like dual socket blades that plug into a back plane.

.

March 30, 2017 | 07:10 AM - Posted by oleyska (not verified)

these are not consumer oriented chips

They are made for servers then retrofitted for consumers, they have ECC memory support, vm support and all the stuff you need a Xeon for.

This is typical AMD, AMD cannot make one perfect cpu for each nichè and thus they've only created server Chips.

Zen server chips and consumer chips are exactly that, 100% the same.

It's the most versatile CPU ever made, and thus the best Chip Ever made.
It's not the best gaming cpu there is,
nor is it the fastest 8 core ever.

March 30, 2017 | 07:29 AM - Posted by tts (not verified)

@ oleyska

Ryzen is consumer oriented. Ryzen is the chip being tested in the article and what is generally being discussed.

Again there are no Zen server parts out yet.

And AMD let their consumer parts support ECC and VM specific instructions/hardware for years in previous CPU's. They just wouldn't verify it as stable or supported.

March 30, 2017 | 08:00 AM - Posted by Anonymous (not verified)

Dude, it's not a consumer-oriented part. Give me one reason why they would go to the effort to develop an entire CPU architecture just to go hit a shrinking HEDT market? The answer is that they wouldn't. They'd go after the growing datacenter cash cow.

March 30, 2017 | 08:59 AM - Posted by tts (not verified)

Even AMD says Ryzen is a consumer oriented part.

The part is not the architecture and its quite possible to derive different parts from the same architecture.

AMD will go after the datacenter/server market but they're going to use a server part to do it and not Ryzen. The arch. will be the same but many aspects of it will change.

March 30, 2017 | 11:26 AM - Posted by pessimistic_observer (not verified)

I think what everyone is trying to say is ccx isnt a mistake cause naples will benifit from it and even though it has an issue on consumer grade ryzen that doesnt matter cause of how small the pc market is.

March 30, 2017 | 12:44 PM - Posted by tts (not verified)

They keep saying the server parts will benefit from it but there is no real substantial reason to believe at this point in time. Its all just vague marketing right now since AMD hasn't given all the details about it.

If AMD could do 12 core parts 5yr ago without a CCX like cache/bus arrangement and Intel can do 20 core parts now with out one I see no reason to believe why one is necessary or a big advantage when doing large core count CPU's.

PC market is small now?! It is shrinking some but millions and millions of CPU's are still sold yearly in PC's. Plenty of money to be made there.

March 31, 2017 | 01:40 AM - Posted by Anonymous (not verified)

Intel does two separate ring busses, and therefore two separate L3 cache zones for their 14 to 18 core haswell-ep Xeons.

http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-hasw...

April 1, 2017 | 08:21 AM - Posted by tts (not verified)

Which isn't the same as AMD's CCX. At least as far as anyone can tell.

I have no issue with split L3's or other cache arrangements so long as its implemented properly. Part of doing a proper implementation is the bus connecting the caches HAS to be low latency. Otherwise you run into odd performance issues like Ryzen is having right now.

Bandwidth matters too of course but latency is more important from what I've seen when it comes to caches.

April 2, 2017 | 03:59 AM - Posted by Anonymous (not verified)

It is almost exactly the same and if you ran a game on a $7000 dollar 14 to 18 core xeon, you would see the same performance characteristics. You are completely ignoring the fact that AMD's design has several advantages. Latency is lower than on Intel's parts as long as you do the relatively minor optimizations to take advantage of it properly. AMD's design also allows for lower power consumption by greater locality. Intel has to drive that massive ring bus interconnect at core clock. Interconnect like that uses a huge amount of power. AMD will have a power consumption advantage along with quite a few other advantages from their modular and scalable design.

April 2, 2017 | 07:05 AM - Posted by tts (not verified)

No I wouldn't because Intel's cache design doesn't have this issue.

And most software is simply not going to be optimized for AMD CPU's of any sort unfortunately. No matter how easy it is. You might see a handful of games or perhaps some other custom stuff but otherwise there'll be nothing at all. AMD is too much of a bit player in the CPU business to warrant any attention by most developers.

That is true now and it was true even when their market share was much much higher.

And bus power isn't a huge issue at all as far as anyone can tell about Intel's or AMD's chips. Adding more cores quickly overwhelms any power savings you're going to see too. There is a reason why Naples is supposed to be a 150W+ chip.

And current Ryzen CPU's are every bit the power hogs as high core count Broadwell/Skylake CPU's too under load. Yes the official TDP's look great but they don't necessarily reflect what the exact peak power draw will be under high power work loads.

https://www.pcper.com/reviews/Processors/AMD-Ryzen-7-1800X-Review-Now-an...

April 2, 2017 | 04:05 AM - Posted by Anonymous (not verified)

And what have you seen when it comes to caches? Did you write a cycle accurate, register transfer level simulator and compared the advantages of the different designs? It is some what amusing to see enthusiast talk about CPUs as if they know anything compared to the CPU designers that actually spend years designing them. So you know, I have heard that latency is important in caches, you know, and I have seen some stuff about caches. Do you have any idea what you actually sound like to people who know about this stuff on a more than superficial level?

April 2, 2017 | 06:54 AM - Posted by tts (not verified)

I don't have to be able to write my own benchmarks or design CPU's to be correct about a technical issue. Just be able to read others benchmarks and observe the effects of what high latency can do to performance.

I also haven't said I'm a CPU designer or even in the field or even some end all be all expert so keep whacking away on that strawman.

March 30, 2017 | 11:48 AM - Posted by Anonymous (not verified)

Ryzen is a consumer brand of the very same Zeppelin die that is used/scaled in a modular fashion to make the Naples(AMD May use Opteron branding or a new server branding name) 4 Zeppelin die server SKU. Zen is the name of the new x86 micro-architecture that will be used across AMD's entire line of x86 ISA based CPUs going forward, with Zen2 based CPUs Codenamed "Pinnacle Ridge" scheduled to arrive in 2018. Ryzen is not a micro-architecture any more than Opteron is a micro-architecture as Ryzen(Consumer) and Opteron(professional) are only brand names.

AMD's consumer parts will support ECC memory but the consumer AM4 motherboards may not be certified to work for production work that requires ECC certification. That certification and ECC support on any AM4 motherboards will be up to the Motherboard's maker. I think that the 16 core server variants that will use 2 Zeppelin dies on an MCM module and a single socket workstation motherboard will probably be very affordable relative to Intel's Xeon offerings of the same core count.

The Zen/Vega APUs are probably going to be different and will probably not make use of the Zeppelin design. So there is where AMD can work something different. Also there will be Interposer based APUs from AMD that will make use of Zeppelin dies and maybe some other Zen core groupings for 4/2 core consumer APU on an interposer variants. Any single CCX unit design for an interposer based APU will probably make use of a very wide Infinity Fabric connection etched out on the interposer to wire up any Zen cores die/s to a lager Vega die and HBM2. And even a single stack of HBM2 provides 256GB/s of effective bandwidth and 4GB or 8GB of memory on any HBM2 based APU on an interposer.

Any interposer based APU with even a single 4GB stack of HBM2 and the rest of the system's memory supplied by regular DIMM based DDR4(single or dual channel) memory will be able to via the HBCC/HBC on any integrated on the interposer Vega GPU/graphics allow the GPU to operate from mostly the HBM2 and hide any slower DIMM based DDR4 DRAM access/latency from the integrated GPU/graphics. So that's 256GB/s for the GPU with any slower DRAM accesses managed in such a way as the allow the GPU to operate mostly from HBM2 with all that 256GB/s of effective HBM2 bandwidth.

April 2, 2017 | 08:51 PM - Posted by Anonymous (not verified)

AMD has already discussed in detail the approach they made with Ryzen. They started with x86 vs ARM in general and focused on the commonalities.

Same goes when differentiating server vs consumer. They looked at all the commonalities so they could focus mainly on the DIFFERENCES.

AMD can't just clone an Intel x86 CPU so there's also that issue. So some of the software is going to have to be optimized if you want the better performance.

For GAMING it's possible that Microsoft and AMD can collaborate on GAME MODE to benefit games in GENERAL. I'm not sure how simple the CCX/thread jumping issue is, nor how much code recompiling is needed to make a notable difference.

But... going FORWARD the design is well thought out. There's ALWAYS going to be legacy vs future architecture decisions. You can't optimize for BOTH at the same time. Think how GCN with ACE architecture works; that was a great design by AMD but it has taken YEARS to get to the point where games are starting to take advantage of it.

March 31, 2017 | 01:07 AM - Posted by Anonymous (not verified)

Intel does almost the same thing with their 14 to 18 core Xeons:

http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-hasw...

I wouldn't be surprised to see Intel designs copying AMD within a couple of years. Intel claimed to have no plans to get rid of the FSB when AMD came out with Opteron, but a while later, Intel came out with almost the same archticture with on-die memory controllers and point to point interprocessor links.

March 30, 2017 | 12:45 AM - Posted by Anonymous (not verified)

With higher clocked memory and the latest ADIA64 benchmark(1). So just manage the core/threads in the game and avoid any CCX to CCX latency by not thread hopping across CCX units. The scalability of the Zeppelin dies is why they were desined that way for the server SKUs.

AMD is probably going to be doing something different for Raven Ridge and the 2 and 4 core APU variants that are not using the Zepplin Die design(2 CCX units). So maybe look there for some differences before Zen+ arrives.

(1)

"Ryzen: Strictly technical"[see post #1259 on page 51.]

https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/pa...

March 30, 2017 | 03:16 AM - Posted by tts (not verified)

Higher clocked RAM does help but its hard to get to 2933 or 3200 with Ryzen right now. 2666 or 2400 DDR4 is too minor of a speed bump to matter much in most software. There is some indication that this is being improved with new BIOS's so maybe it'll not matter too much in the future.

1st impressions count for a whole lot is something you have to consider. That and many people aren't interested in using various tweaks to force their software to conform to the hardware's limitations.

They just want stuff to work by and large.

Ultimately as a business AMD has to make the customers happy if they want to sell their product and dumb unforced errors like high inter CCX latency won't make people happy.

My WAG is they were worried about yields and profits and they wanted to make sure they could sell most every die as at least a salvage part of some sort with disabled cores as necessary and making sure the L3 cache and cores could be easily broken up is a good way to do that.

March 30, 2017 | 10:35 AM - Posted by Anonymous (not verified)

The Zen micro-architecture had 12% better IPG performance than the original 40% IPC metric over the Excavator micro-architecture that was initially stated by AMD so that’s 52% greater IPC performance over Excavator and the Zeppelin Die modular unit is great for scaling server/workstation SKUs and not so bad for the Ryzen 7 series consumer parts. AMD will be doing the same thing with its Navi GPU micro-architecture for some very scalable GPU designs and much better die/wafer yields for the Navi SKUs and any CPU, and server/workstation APU(Interposer and HBM2 based with large Vega dies and Zeppelin CPU dies), SKUs that will be coming to the market over the next few years. AMD’s modular and very scalable designs will save it’s customers plenty as AMD will be able to price its offerings in a much more affordable range for consumers and the server/HPC/Workstation markets.

So look at what a little optimization work for gaming software can do for Ryzen and expect that optimization work to spread across the entire AMD Zen/x86 software ecosystem for compilers, middle-ware, and the SDKs that will have their massive code libraries optimized for the Zen Micro-Architecture.

You appear to have an agenda and only want to spin negative without have any substantive and logical arguments other than to express your disdain for AMD’s efforts that have yielded very good results for AMD’s new Zen Micro-Architecture. The tweaking will continue and both AMD’s Zen CPUs and GPUs will continue to improve over time as the gaming and other software ecosystems come online with optimizations targeting AMD’s CPU/GPU hardware.

"My WAG is they were worried about yields and profits"

No your very Non Substantive negative spin would be a more proper conclusion to anyone reading you posts.

March 30, 2017 | 12:38 PM - Posted by tts (not verified)

That AMD was able to beat its original IPC improvement estimate is good but has no bearing on what is being discussed.

Navi is all vague soft rumors at this point and any scalability it has will likely not come from a last level inter cache bus. Especially not one that doesn't have good low latency.

That only a little optimization is needed to address the CCX issue doesn't really matter since most software will likely not be updated to deal with it properly. Its irritating but a common issue with new hardware features or quirks and software.

I've also said before that Ryzen is a good CPU with good performance vs cost ratio so I'm not sure what you're getting at with "negative spin" either. You know what the acronym "WAG" means right?

Someone pushing spin or FUD isn't going to use that at all.

March 30, 2017 | 04:08 PM - Posted by Anonymous (not verified)

Keep telling yourself that but Intel's high margin days are over, let the layoffs begin. I do not care what WAG means I'm not a Joe Six-Pack slack jawed yokel. WAG has no meaning in the technology field!

Ryzen is getting the necessary optimizations for games and gaming engines and Intel is still trying that tired old marketing spin! I'll bet that Intel will drag out those tired old cleanroom suit wearing dancers but it will not work this time around.

WAG, Ha ha, GTFO with that crap! No one will have Intel inside and a ring throug their nose. The mobile market OEM's are very happy with Intel's grubby little hands off of their supply chains!

March 30, 2017 | 11:28 PM - Posted by tts (not verified)

Dude don't drunk post

March 31, 2017 | 12:18 AM - Posted by Anonymous (not verified)

Go WAG your sports Hag with some toilet swag!

Really when are those Intel layoffs coming, it looks like Fab 42 may have to go back in mothballs or one of the other older fab buildings can be shut down. I'll bet those clean room suit dancers will be bumed out when they are let go.

Hopefully GF and TSMC can more fully automate things so that's less labor costs to worry about if they bring any new chip plant to the states. ROBOTS RULE!

AMD will fustigate Intel's high margin business model!

March 31, 2017 | 01:35 PM - Posted by 0VERL0RD (not verified)

Best still to come, Ryzen @ 3600mhz DDR4 https://youtu.be/RZS2XHcQdqA

March 31, 2017 | 12:56 AM - Posted by Anonymous (not verified)

Intel did the same thing essentially with their haswell-ep parts:

http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-hasw...

AMD gets lower latency than the comparable Intel parts, as long as you stay within the CCX for threads that share a lot of data. It is a good trade off, but it requires some optimization work. It is essentially the same situation as Opteron all over again. Intel had the chipset based memory controller, which had higher latency, but the same latency for all threads. The Opteron had memory controllers on each CPU and point-to-point interconnect. Latency was lower and bandwidth was higher, as long as you did proper NUMA optimizations to stay on the same socket. Intel switched to essentially the same architecture pretty quickly. They of course had to invent their own clone of HyoerTransport instead of creating an open industry standard.

The shared ring bus is similar to a chipset based memory controller. It provides roughly the same latency for all attached cores. It actually scales quite well, but Intel uses two independent ring buses for their 14 to 18 core parts. It has up to 8 cores on one and up to 10 on another. These chips would see similar performance characteristics to Ryzen if you have threads that share data operating on a different ring bus with a separate L3 cache; the cache coherency traffic and data traffic will explode.

With the distributed, modular design AMD has come up with, they get big advantages in latency and bandwidth. If you double the number of die, you double the number of memory controllers. Latency stays very low, if you stay within the same CCX. It has massive advantages as far as manufacturing. Instead of making massive low yield dies, AMD can just pump out huge numbers of the same smaller die. It probably has 64 high speed IO lanes. If some of the lanes are defective, they can just be sold as consumer parts with a small number of HSIO lanes. The bandwidth situation is actually quite interesting. The cross bar is supposed to by 32 bytes a clock. The memory is 128 bits, which is 16 bytes, but it is DDR, so it is 32 bytes per cross bar clock. The interprocessor links are 32 lanes wide, which is 4 bytes, so they need to be clocked at a little over 8x the cross bar clock. With DDR 2666, the actual clock is 1333, so an 8x multiplier would be 10.6 GHz. PCI express 3.0 is at 8 GHz and 4.0 is supposed to double that. With the trace length for processors in an MCM could be a few millimeters, so clocking very high is probably possible. They may be able to have remote memory at the same bandwidth as local memory. I don't know how much AMD will charge for a 32 core part, but there is a big difference in manufacturing cost for a monolithic 32 core, 64 MB cache die (maybe $10,000 or more considering yields) and 4 $500 die placed on one MCM.

March 31, 2017 | 02:16 AM - Posted by Anonymous (not verified)

The yield on four core dies is massively higher than a monolithic 8 core die. Minimally offset by a bit of extra costs on the implementation of the MCM.

Hence, the reason AMD can price their eight core offerings at half the price of Intel and still make good margins.

AMD probably didn't want to chance trusting GLOFO with trying to get good yields on 8 core dies out of the gate anyways.

Smart move by AMD on two fronts.

April 2, 2017 | 04:31 AM - Posted by Anonymous (not verified)

Ryzen is based on an 8 core 16 thread monolithic die. The biggest GPUs we have seen were around 600 square mm, and that was on 28 nm, where yields were probably spectacular compared to 14 and 16 nm yields. With AMD's very modular, scalable design, they can build a 32 core/64 thread part out of cheap 200 square mm die. I don't think anyone wants to make an 800 square mm part on 14 nm. The Ryzen base die probably has 64 HSIO links. Any that have defective links can just be sold as consumer parts while they stockpile fully functional die for Naples server parts. These should be much more reasonable cost compared to Intel parts. It is still 4 Ryzen die that are $500 each, so it will still be very expensive compared to consumer prices, but they could be much cheaper than Intel prices. Some of the haswell based Xeons were around $7000 list price, but I think the top of the line broadwell based part was only a little over $4000.

March 29, 2017 | 11:52 PM - Posted by Ha-Nocri (not verified)

What about Deus Ex? I saw some benchmarks recently that show RyZen beating Intel in DX12. Not sure if the game was also updated for RyZen, like AotS

March 30, 2017 | 12:07 AM - Posted by Anonymous (not verified)

Showing min framerate results would make this a ton more useful.

March 30, 2017 | 12:20 AM - Posted by Ryan Shrout

Actually, no, minimum is kind of useless. If we had shown 99 or 99.9 percentile, I would agree though.

Just didn't have time.

March 30, 2017 | 08:44 AM - Posted by Anonymous (not verified)

You are wrong in saying showing minimums is useless. Extremely wrong.

March 30, 2017 | 09:43 AM - Posted by cheekynakedoompaloompa (not verified)

absolute minimums tend to skew actual performance, particularly in canned benchmarks where the first few frames can occur while assets are still being loaded into vram; to recollection gta v has a bit of this, as does hitman absolution and tombraider 2013 among others. add in other stutters from random background program cpu spikes, forced interrupts from other connected hardware etc and its better to just toss the extreme outliers(the .09%) and just go with 1% and .1% as the actual useful minimum. consider this, if you have a 1 minute benchmark and are running at 100fps avg should a couple frames in the 20's sprinkled throughout matter when there's nearly 6000 other frames that are fine? usually not. it doesnt give you an accurate picture of the games performance on that hardware.

March 30, 2017 | 01:31 PM - Posted by Allyn Malventano

A 'minimum' metric, where a single frame in thousands determines the value, is definitely useless. Same applies to SSD testing, which is why industry standards use percentiles and not min/max figures.

March 30, 2017 | 07:05 PM - Posted by Anonymous (not verified)

Actually you are. Ignorant little prick.

March 31, 2017 | 12:51 AM - Posted by Anonymous (not verified)

Percentiles are the best for showing the least skewed by outliers results.

"Percentile"

https://en.wikipedia.org/wiki/Percentile

March 31, 2017 | 01:11 AM - Posted by Anonymous (not verified)

He said minimum(singular) and you are saying minimums(Plural) is best to use percentiles and you need to think before you post!

March 30, 2017 | 12:12 AM - Posted by Anonymous (not verified)

It's going to get even more Interesting once the AM4 motherboard makers get those faster memory clocks running more stable on their various AM4 SKUs. So some firmware tweaks are occurring on a frequent basis. The AMD Generic Encapsulated Software Architecture (AGESA) updates and other tweaking will continue along with the games makers optimizing their code for the Zen Micro-Architecture used in the Ryzen consumer and server(Naples) SKUs. AMD's SMT appears to be more efficient that Intel's also and maybe higher clocked memory and other tweaks will have to be tested with more frequent benchmarking.

It’s interesting that both Zen/Ryzen and AMD’s GCN SKUs from the previous 2 generations back from Polaris may both be seeing more improvements as more gaming software is optimized for Vulkan and DX12 for AMD’s GPUs and for Zen/Ryzen also. Any new Ryzen steppings will begin to make further improvements in clock speeds and maybe there will be more information to be had on AMD’s Infinity Fabric and the rumored 12 and 16 core dual Zeppelin Die variants on an MCM.

I’d really like to see 2 and 3 Zeppelin dies variants for some lower cost workstation SKUs that will have plenty of PCIe lanes. And what about the Infinity Fabric’s on Zen/Naples ability to offer a faster than PCIe connection in a more direct attached CPU to discrete GPU(Radeon/Vega Pro WX SKUs) way on any server and workstation motherboards that make use of those server/workstation grade LGA sockets and any Zen server/workstation SKUs. I’m guessing that there will be at least 16 core, and 24 core variants in addition to the Zen/Naples 32 core top end SKU, with the 16 core being maybe a nice single socket workstation option for the mid/lower cost end of the more affordable workstation market.

March 31, 2017 | 02:11 AM - Posted by Anonymous (not verified)

I wouldn't expect to see a 3 die variant. You are going to have a 2 channel socket (AM4), a quad channel socket, and an 8 channel socket. If you made a 3 die device, what socket would it go into? If you put it in the 8 channel socket, two of your memory channels on the board would be unusable. If you put it in the quad socket, one of the die would have no memory attached to it. They can do a lot of the core counts just by using 6 or 8 core die. Two 6-core for 12, two 8-core for 16, four 6-core for 24, and the full four 8-core for 32. I don't know if they would do four 4-core. That would be a strange beast; 16 cores with 8 channels of memory. The 32-core device requires all HSIO lanes are functional. Some of the other variants may not.

March 30, 2017 | 12:36 AM - Posted by opl (not verified)

why didn`t use 1080ti?

March 30, 2017 | 12:47 AM - Posted by zme-ul

how big was the patch?

March 30, 2017 | 12:43 PM - Posted by Anonymous (not verified)

This big:
|<------------------->|

March 30, 2017 | 01:20 AM - Posted by Anonymous (not verified)

I didn't expect that much of a speedup. it went from ivy bridge level to bwe level. Kind of stunned. A twenty percent speedup from a patch is a lot dude.

And that cross core latency graph is mesmerizing... I feel like I'm about to go into the grid

March 30, 2017 | 01:54 AM - Posted by Anonymous (not verified)

"These software updates would need to be implemented by the game developers themselves in order to take advantage of the unique and more complex core designs of the Zen architecture."

I posted this link here before, but I will post it again:

http://www.anandtech.com/show/8423/intel-xeon-e5-version-3-up-to-18-hasw...

The 14 to 18 core haswell-ep variants have two separate ring busses. A shared ring bus isn't going to scale to that high of core count very well. The cache architecture is probably a bit different, but it still has two different zones. The article is from 2014, so this isn't new, but the number of people using a 14 to 18 core Xeon for gaming is probably very small. The number of people using an 8 core Ryzen processor could end up quite large though.

I am not that surprised at the performance increase. The AMD architecture offers much lower core-to-core latency than Intel parts of similar core counts, as long as you stay on the same CCX. Going across CCXs would cause a lot of transfers between CCXs with much higher latency.

March 30, 2017 | 03:30 AM - Posted by Anonymous (not verified)

Wow with faster ram Ryzen smokes the Intel parts. I see why many reviewers today still test with low frequency ram when most kits work fine with Ryzen after bios updates. My 3000 mhz gskill "intel optimized" kit worked on day 1 without problems.

March 30, 2017 | 05:01 AM - Posted by Anonymous (not verified)

You should check with your eye doctor, there is no 6900K test in High preset, and 1800X/3200MHz got smoked by 6900K/2400MHz in extreme preset, not to mention the latter can achieve much more once overclocked (3.5G -> 4.3G All-Core) where 1800X only increased little (3.7G -> 4.0G All-Core), more importantly AotS means very little to everyday gaming.

March 30, 2017 | 06:45 AM - Posted by StephanS

The 6900k win by less the 4% in this game, and cost $1100 at all major retail (like newegg)

The reviews I seen done using 3600mhz ram show a 4ghz r7 1700 beating a 5ghz i7 7700k in nearly all game tested.

We have seen many day 1 reviews.but they dont paint the picture clearly for gaming. (even cinebench goes a little higher with faster ram)

multi module is the future of CPU, so I expect software to sort this out if they want to work on modern architectures.
for sure server class software already take this design into account. Game engine are just many years behind the curve, but they will get there.

But if ryzen already beat a 5ghz i7-7700k for 1080p gaming with existing titles, all is good.

March 30, 2017 | 05:07 AM - Posted by Anonymous (not verified)

How on Earth does 77 fps "smoke" 80 fps? The 6900K is still better, just by a less dramatic amount.

March 30, 2017 | 06:37 AM - Posted by StephanS

The 6900k is still $1100 at newegg and major retail stores.

More then twice the price to gain 3fps, under 4%.

So the 6900k edge in this one game is indeed not dramatic at all.

March 30, 2017 | 07:27 AM - Posted by Anonymous (not verified)

Look at the prices, Of course it cannot pass 6900k, but r7 1700 costs less than 1/3rd of its price.

If this is not smoking, what is??

March 31, 2017 | 12:25 AM - Posted by Anonymous (not verified)

Well the Ryzen 7 1700 will not leave a smoking hole in your wallet. So yes the 1700 does not smoke like the 6900K does in that respect.

March 30, 2017 | 03:51 AM - Posted by JohnGR

I wouldn't expect many games to get optimized for Ryzen. But this does show promise for future titles. Also IF Ryzen sells and AMD's financials improve, we could see all, or at least most, future games getting GCN and Ryzen optimizations.

March 30, 2017 | 09:50 AM - Posted by tatakai

game engines will.

March 30, 2017 | 10:56 AM - Posted by Anonymous (not verified)

Gaming engines will be very quickly optimized for Zen/Ryzen as will the productivity applications. Ditto for the Server/Workstation/HPC market and the Zen product end users that will really put the resources towards optimizing the entire productivy software ecosystem for the professional as the consumer Zen/Zeppelin die based variants that are used across the consumer Ryzen 7, 5, 3 series SKUs and the Server/Workstation/HPC SKUs that are based on the scaled up Zeppelin dies on an MCM.

AMD's Zen CPU and GCN GPU variants are all around productivity and gaming better deals for users that do more than only game on their systems so that's what makes AMD's products wanted across a larger market of end users. I'll be waiting for the dual Zeppelin dies on an MCM and plenty
of Infinity fabric bandwidth Zen dies to discrete GPU(Vega) die, as Vega will also have Infinity Fabric support. So AMD Workstation/Server motherboards will have better options than only PCIe to interface with AMD's Vega based GPUs.

March 30, 2017 | 10:57 AM - Posted by Mr.Gold (not verified)

Well, also 10% boost just by using 3200mhz vs 2400mhz ram.

I believe PCPer did all its Ryzen review using 2400mhz ram ?

So if this scale, we might see a big boost in the gaming result without any developer having to code anything.

March 30, 2017 | 11:29 AM - Posted by JohnGR

If you look at the pictures, they tested with both 2400 and 3200MHz RAM, at least in the AOTS case.

March 30, 2017 | 01:29 PM - Posted by Mr.Gold (not verified)

Ok, correction. PCPer did all, but one benchmark, using 2400 ram?

I'm looking at their Ryzen review and all the associated results.

Ram never used to influence score that much, but here it seem pairing a $500 CPU with 2400mhz ram doesn't make sense for ram intensive software (like games)

And it even seem to show that 3600mhz ram put Ryzen at an advantage over the i7-7700k for gaming.
3600 mhz today seem impracticale on Ryzen, but it show that Ryzen can be "unlocked" for gaming with faster ram.

AMD is said to have another microcode release for May that tweak the memory controller. If they get 3600 A-XMP, all game benchmark can be thrown out the window.

March 31, 2017 | 04:45 AM - Posted by JohnGR

There is that theory(fact?) that infinity fabric is tied with RAM's speed, so if infinity fabric is the bottleneck, the speedier the RAM, the less bottleneck infinity fabric becomes.
Now if by optimizing for Ryzen you can minimize the need to utilize the infinity fabric, so you don't get bottlenecked by it, you probably get what this patch for AOTS is showing us.

March 30, 2017 | 12:11 PM - Posted by Anonymous (not verified)

The consoles have a similar structure, so any optimization work done for them can be ported over to PC game engines. It also helps for developers to be familiar with what needs to be done. It is ridiculous to continue to push single thread performance. It was in Intel's best interest to do so, to protect their profits. With an 8 core processor, a single thread can only make use of 12.5 percent of the available processing power. With 16 cores it is only 6.25 percent. With an 8 core, increasing single thread performance by 20% would only be 15% of the original performance potential. Developers just need to do the work.

March 30, 2017 | 06:40 AM - Posted by Justin150 (not verified)

I will admit that all my systems for last 10 yrs have been intel based but I do have an old AMD based system in the loft somewhere.

I have no particular allegiance to either AMD or Intel.

Ryzen has 2 key issues for me.

1. At launch it was immature on software (specifically games), this is not surprising, developers will only take an interest once they see some real sale numbers.

2. It seems unduly sensitive to memory speeds. We are promised some Bios fixes but this feels like an issue that should have been icked up and dealt with before launch

Fortunately I am not intending to build a new system until Q3 2017 so I can afford to wait to see what happens. What I am hoping is that we get a series of bios and game upgrades which make Ryzen far more potent. That way Intel will need to respond, maybe even with price cuts!

Either way this bodes well for us consumers

March 30, 2017 | 07:18 AM - Posted by madweazl (not verified)

It isnt unduly sensitive to memory speeds, it is directly affected by memory speed because of the Infinity Fabric and was designed that way. It is however, extremely finicky with which memory performs well with it right now (Samsung B-die modules being the easiest for the time being). I dont hold much hope that existing modules will produce big gains but future modules should see a nice boost in speed.

March 30, 2017 | 10:55 AM - Posted by Mr.Gold (not verified)

From your chart, I see a 10% boost in FPS using 3200mhz ram VS 2400mhz (85.5fps vs 78.4fps)

PCPer, didn't you use 2400mhz ram for you entire Ryzen review ?

Any chance you will redo the game benchmarks ?

It seem Ryzen benefit from higher clock ram, one review show a 4ghz R7 1700 beating the i7-7700k at 5ghz when using 3600mhz ram.

So it seem that if/when AMD tweak A-XMP Ryzen will get a MAJOR boost in gaming , without developers having to write any new code.

Are you tried to redo your original game benchmark with the 3200 vs 2400 ram ? did you notice any improvements ?

March 30, 2017 | 11:26 AM - Posted by Mr.Gold (not verified)

Interesting : AMD testing the same thing

https://community.amd.com/community/gaming/blog/2017/03/30/amd-ryzen-com...

Also, some microcode tweak including :

"We have reduced DRAM latency by approximately 6ns. This can result in higher performance for latency-sensitive applications."

March 30, 2017 | 11:28 AM - Posted by mLocke

Oxide themselves say that Intel processors saw a performance increase too, so your findings of otherwise are a bit surprising.

source: youtu.be/6yRMaDmrDxg?t=55

March 30, 2017 | 11:57 AM - Posted by Ashlord (not verified)

Makes me wonder how much impact the CCX latency has on the graphics card drivers. I'd assume Team Green won't care much to optimize for Ryzen until the user base is large enough but what about Team Red? Maybe there is still room to milk more performance out of the Ryzen.

March 30, 2017 | 05:20 PM - Posted by Martin Trautvetter

Have you tried play testing AotS? PCGamesHardware.de suggest that while there are gains from this patch during actual play, Ryzen is still far from competitive in their save game scenario:

http://www.pcgameshardware.de/Ryzen-7-1800X-CPU-265804/Specials/AMD-AotS...

I wonder what optimizations would allow for Ryzen to be close in the built-in CPU benchmark, while only delivering 2/3rds of the performance of a 7700K in actual game play.

March 30, 2017 | 05:38 PM - Posted by Anonymous (not verified)

That seems unlikely for there to be that big of a difference unless the benchmark is more of a synthetic test and not based on actual gameplay. If the benchmark was used to do the optimizations, then I would expect it to see the most benefits. If the benchmark is a good representative of the rest of the game, then it should translate everywhere. They may be hitting a corner case though. What part of the game you choose to test could make big differences. Also, I don't know how much I would trust the numbers from non-benchmarked play as this can have wide variations.

March 31, 2017 | 08:54 AM - Posted by Anonymous (not verified)

Ryan

There is already another Ryzen-updated game, and it's much more popular (for gamers) Dota 2:
https://www.phoronix.com/scan.php?page=news_item&px=Dota-2-Ryzen-Optimiz...
https://community.amd.com/community/gaming/blog/2017/03/30/amd-ryzen-com...

March 31, 2017 | 01:35 PM - Posted by Mr.Gold (not verified)

Another interesting piece.

It seem nvidia Dx12 driver are not as good as AMD...
https://youtu.be/0tfTZjugDeg?t=891

By A LOT.

Not as good in the sense that it doesnt really spread the load over cores. So nvidia + dx12 still stress single threaded workload.

It seem Ryzen R5 + RX 480 might be best combo for a <$600 gaming PC

March 31, 2017 | 10:16 PM - Posted by Allyn Malventano

That is old news really, and it depends on which specific DX12 calls are used by a given game. All of the results in your linked video are specific to ROTR only. Other titles are different.

April 1, 2017 | 05:18 PM - Posted by Anonymous (not verified)

That why more Ryzen and Dual RX 480s(Until Vega arrives) benchmarks are needed and even afterwords using Dual RX 480s and Dual GTX 1060s under Vulkan's/DX12's API managed unlinked multi-GPU adaptor functionality.

The whole Dual GPU scaling issue is going to become a feature that is Managed under the APIs(DX12/Vulkan)/NON CF-SLI and can be managed directly by the Games/Gaming engine software.

All this using one or the others GPU products for all testing needs to stop, if one is testing any makers CPU SKUs.

Now that AMD's is producing a more competitive CPU part, and better GPU parts the testing of many configurations using combinations of all the makers CPU and GPU SKUs will be needed or the results will never be fully trusted!

I'm sure there will be plenty of Ryzen and Dual RX 480 users now that some of the gaming software is starting produce improvements.

Websites that do not do PLENTY of different(single/Dual GPU, Vulkan/DX12 Linked/Unlinked[NON CF/SLI but graphics API managed multi-GPU] with AMD/Intel CPUs) configuration benchmarks will be thought of as just marketing oriented without any real substance with regards to the needs of the enthusiasts.

April 2, 2017 | 12:17 AM - Posted by StephanS

PCPer, can you do the exact same benchmark but with a Fury X?

It seem some have found nvidia drivers to sandbag Dx12 performance.
(single threaded heavy)

Not sure what to make of it all.

April 2, 2017 | 10:34 PM - Posted by Anonymous (not verified)

Oh, that worthless CCX "latency" "measurements" again. Many shekels to be had.

Retards.