** UPDATE 3/13 5 PM **
AMD has posted a follow-up statement that officially clears up much of the conjecture this article was attempting to clarify. Relevant points from their post that relate to this article as well as many of the requests for additional testing we have seen since its posting (emphasis mine):
"We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."
"Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes."
So there you have it, straight from the horse's mouth. AMD does not believe the problem lies within the Windows thread scheduler. SMT performance in gaming workloads was also addressed:
"Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.
For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready."
We are still digging into the observed differences of toggling SMT compared with disabling the second CCX, but it is good to see AMD issue a clarifying statement here for all of those out there observing and reporting on SMT-related performance deltas.
** END UPDATE **
Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posted that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the logical vs. physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout
Initial reviews of AMD’s Ryzen CPU revealed a few inefficiencies in some situations particularly in gaming workloads running at the more common resolutions like 1080p, where the CPU comprises more of a bottleneck when coupled with modern GPUs. Lots of folks have theorized about what could possibly be causing these issues, and most recent attention appears to have been directed at the Windows 10 scheduler and its supposed inability to properly place threads on the Ryzen cores for the most efficient processing.
I typically have Task Manager open while running storage tests (they are boring to watch otherwise), and I naturally had it open during Ryzen platform storage testing. I’m accustomed to how the IO workers are distributed across reported threads, and in the case of SMT capable CPUs, distributed across cores. There is a clear difference when viewing our custom storage workloads with SMT on vs. off, and it was dead obvious to me that core loading was working as expected while I was testing Ryzen. I went back and pulled the actual thread/core loading data from my testing results to confirm:
The Windows scheduler has a habit of bouncing processes across available processor threads. This naturally happens as other processes share time with a particular core, with the heavier process not necessarily switching back to the same core. As you can see above, the single IO handler thread was spread across the first four cores during its run, but the Windows scheduler was always hitting just one of the two available SMT threads on any single core at one time.
My testing for Ryan’s Ryzen review consisted of only single threaded workloads, but we can make things a bit clearer by loading down half of the CPU while toggling SMT off. We do this by increasing the worker count (4) to be half of the available threads on the Ryzen processor, which is 8 with SMT disabled in the motherboard BIOS.
SMT OFF, 8 cores, 4 workers
With SMT off, the scheduler is clearly not giving priority to any particular core and the work is spread throughout the physical cores in a fairly even fashion.
Now let’s try with SMT turned back on and doubling the number of IO workers to 8 to keep the CPU half loaded:
SMT ON, 16 (logical) cores, 8 workers
With SMT on, we see a very different result. The scheduler is clearly loading only one thread per core. This could only be possible if Windows was aware of the 2-way SMT (two threads per core) configuration of the Ryzen processor. Do note that sometimes the workload will toggle around every few seconds, but the total loading on each physical core will still remain at ~%50. I chose a workload that saturated its thread just enough for Windows to not shift it around as it ran, making the above result even clearer.
Synthetic Testing Procedure
While the storage testing methods above provide a real-world example of the Windows 10 scheduler working as expected, we do have another workload that can help demonstrate core balancing with Intel Core and AMD Ryzen processors. A quick and simple custom-built C++ application can be used to generate generic worker threads and monitor for core collisions and resolutions.
This test app has a very straight forward workflow. Every few seconds it generates a new thread, capping at N/2 threads total, where N is equal to the reported number of logical cores. If the OS scheduler is working as expected, it should load 8 threads across 8 physical cores, though the division between the specific logical core per physical core will be based on very minute parameters and conditions going on in the OS background.
By monitoring the APIC_ID through the CPUID instruction, the first application thread monitors all threads and detects and reports on collisions - when a thread from our app is running on the same core as another thread from our app. That thread also reports when those collisions have been cleared. In an ideal and expected environment where Windows 10 knows the boundaries of physical and logical cores, you should never see more than one thread of a core loaded at the same time.
Click to Enlarge
This screenshot shows our app working on the left and the Windows Task Manager on the right with logical cores labeled. While it may look like all logical cores are being utilized at the same time, in fact they are not. At any given point, only LCore 0 or LCore 1 are actively processing a thread. Need proof? Check out the modified view of the task manager where I copy the graph of LCore 1/5/9/13 over the graph of LCore 0/4/8/12 with inverted colors to aid viewability.
If you look closely, by overlapping the graphs in this way, you can see that the threads migrate from LCore 0 to LCore 1, LCore 4 to LCore 5, and so on. The graphs intersect and fill in to consume ~100% of the physical core. This pattern is repeated for the other 8 logical cores on the right two columns as well.
Running the same application on a Core i7-5960X Haswell-E 8-core processor shows a very similar behavior.
Click to Enlarge
Each pair of logical cores shares a single thread and when thread transitions occur away from LCore N, they migrate perfectly to LCore N+1. It does appear that in this scenario the Intel system is showing a more stable threaded distribution than the Ryzen system. While that may in fact incur some performance advantage for the 5960X configuration, the penalty for intra-core thread migration is expected to be very minute.
The fact that Windows 10 is balancing the 8 thread load specifically between matching logical core pairs indicates that the operating system is perfectly aware of the processor topology and is selecting distinct cores first to complete the work.
Information from this custom application, along with the storage performance tool example above, clearly show that Windows 10 is attempting to balance work on Ryzen between cores in the same manner that we have experienced with Intel and its HyperThreaded processors for many years.
The right angle
While many in the media and enthusiast communities are still trying to fully grasp the importance and impact of the recent AMD Ryzen 7 processor release, I have been trying to complete my review of the 1700X and 1700 processors, in between testing the upcoming GeForce GTX 1080 Ti and preparing for more hardware to show up at the offices very soon. There is still much to learn and understand about the first new architecture from AMD in nearly a decade, including analysis of the memory hierarchy, power consumption, overclocking, gaming performance, etc.
During my Ryzen 7 1700 testing, I went through some overclocking evaluation and thought the results might be worth sharing earlier than later. This quick article is just a preview of what we are working on so don’t expect to find the answers to Ryzen power management here, only a recounting of how I was able to get stellar performance from the lowest priced Ryzen part on the market today.
The system specifications for this overclocking test were identical to our original Ryzen 7 processor review.
|Test System Setup|
|CPU||AMD Ryzen 7 1800X
AMD Ryzen 7 1700X
AMD Ryzen 7 1700
Intel Core i7-7700K
Intel Core i5-7600K
Intel Core i7-6700K
Intel Core i7-6950X
Intel Core i7-6900K
Intel Core i7-6800K
|Motherboard||ASUS Crosshair VI Hero (Ryzen)
ASUS Prime Z270-A (Kaby Lake, Skylake)
ASUS X99-Deluxe II (Broadwell-E)
|Storage||Corsair Force GS 240 SSD|
|Graphics Card||NVIDIA GeForce GTX 1080 8GB|
|Graphics Drivers||NVIDIA 378.49|
|Power Supply||Corsair HX1000|
|Operating System||Windows 10 Pro x64|
Of note is that I am still utilizing the Noctua U12S cooler that AMD provided for our initial testing – all of the overclocking and temperature reporting in this story is air cooled.
First, let’s start with the motherboard. All of this testing was done on the ASUS Crosshair VI Hero with the latest 5704 BIOS installed. As I began to discover the different overclocking capabilities (BCLK adjustment, multipliers, voltage) I came across one of the ASUS presets. These presets offer pre-defined collections of settings that ASUS feels will offer simple overclocking capabilities. An option for higher BCLK existed but the one that caught my eye was straight forward – 4.0 GHz.
With the Ryzen 1700 installed, I thought I would give it a shot. Keep in mind that this processor has a base clock of 3.0 GHz, a rated maximum boost clock of 3.7 GHz, and is the only 65-watt TDP variant of the three Ryzen 7 processors released last week. Because of that, I didn’t expect the overclocking capability for it to match what the 1700X and 1800X could offer. Based on previous processor experience, when a chip is binned at a lower power draw than the rest of a family it will often have properties that make it disadvantageous for running at HIGHER power. Based on my results here, that doesn’t seem to the case.
By simply enabling that option in the ASUS UEFI and rebooting, our Ryzen 1700 processor was running at 4.0 GHz on all cores! For this piece, I won’t be going into the drudge and debate on what settings ASUS changed to get to this setting or if the voltages are overly aggressive – the point is that it just works out of the box.
Subject: Processors | March 8, 2017 - 02:43 PM | Jeremy Hellstrom
Tagged: Ryzen 1700X, Ryzen 1700, amd
With suggested prices of $330 for the Ryzen 1700 and $400 for the 1700X, a lot of users are more curious about the performance of these two chips, especially with some sites reporting almost equal performance when these chips are overclocked. [H]ard|OCP tested both of these chips at the same clock speeds to see what performance differences there are between the two. As it turns out the only test which resulted in delta of 1% or more was WinRAR, all other tests showed a minuscule difference between the X and the plain old 1700. They are going to follow these findings up with more tests, once they source some CPUs from retail outlets to see if there are any differences there.
"So there has been a lot of talk about what Ryzen CPU do you buy? The way I think is that you want to buy the least expensive one that will give you the best performance. That is exactly what we expect to find out here today. Is the Ryzen 1700 for $330 as good as the $400 1700X, or even the $500 1800X? "
Here are some more Processor articles from around the web:
- AMD Ryzen 7 1800X @ eTeknix
- AMD Ryzen 7 1700X @ Kitguru
- Athlon X4 860K @ Hardware Secrets
- Intel 7th Generation Core i3 7350K Processor Review @ OCC
Subject: Processors | March 7, 2017 - 09:02 AM | Tim Verry
Tagged: SoC, server, ryzen, opteron, Naples, HPC, amd
Over the summer, AMD introduced its Naples platform which is the server-focused implementation of the Zen microarchitecture in a SoC (System On a Chip) package. The company showed off a prototype dual socket Naples system and bits of information leaked onto the Internet, but for the most part news has been quiet on this front (whereas there were quite a few leaks of Ryzen which is AMD's desktop implementation of Zen).
The wait seems to be finally over, and AMD appears ready to talk more about Naples which will reportedly launch in the second quarter of this year (Q2'17) with full availability of processors and motherboards from OEMs and channel partners (e.g. system integrators) happening in the second half of 2017. Per AMD, "Naples" processors are SoCs with 32 cores and 64 threads that support 8 memory channels and a (theoretical) maximum of 2TB DDR4-2667. (Using the 16GB DIMMs available today, Naples support 256GB of DDR4 per socket.) Further, the Naples SoC features 64 PCI-E 3.0 lanes. Rumors also indicated that the SoC included support for sixteen 10GbE interfaces, but AMD has yet to confirm this or the number of SATA/SAS ports offered. AMD did say that Naples has an optimized cache structure for HPC compute and "dedicated security hardware" though it did not go into specifics. (The security hardware may be similar to the ARM TrustZone technology it has used in the past.)
Naples will be offered in single and dual socket designs with dual socket systems offering up 64 cores, 128 threads, 32 DDR4 DIMMs (512 GB using 16 GB modules) on 16 total memory channels with 21.3 GB/s per channel bandwidth (170.7 GB/s per SoC), 128 PCI-E 3.0 lanes, and an AMD Infinity Fabric interconnect between the two processor sockets.
AMD claims that its Naples platform offers up to 45% more cores, 122% more memory bandwidth, and 60% more I/O than its competition. For its internal comparison, AMD chose the Intel Xeon E5-2699A V4 which is the processor with highest core count that is intended for dual socket systems (there are E7s with more cores but those are in 4P systems). The Intel Xeon E5-2699A V4 system is a 14nm 22 core (44 thread) processor clocked at 2.4 GHz base to 3.6 GHz turbo with 55MB cache. It supports four channels of DDR4-2400 for a maximum bandwidth of 76.8 GB/s (19.2 GB/s per channel) as well as 40 PCI-E 3.0 lanes. A dual socket system with two of those Xeons features 44 cores, 88 threads, and a theoretical maximum of 1.54 TB of ECC RAM.
AMD's reference platform with two 32 core Naples SoCs and 512 GB DDR4 2400 MHz was purportedly 2.5x faster at the seismic analysis workload than the dual Xeon E5-2699A V4 OEM system with 1866 MHz DDR4. Curiously, when AMD compared a Naples reference platform with 44 cores enabled and running 1866 MHz memory to a similarly configured Intel system the Naples platform was twice as fast. It seems that the increased number of memory channels and memory bandwidth are really helping the Naples platform pull ahead in this workload.
AMD further claims that its Naples platform is more balanced and suited to cloud computing and scientific and HPC workloads than the competition. Specifically, Forrest Norrod the Senior Vice president and General Manager of AMD's Enterprise, Embedded, and Semi-Custom Business Unit stated:
“’Naples’ represents a completely new approach to supporting the massive processing requirements of the modern datacenter. This groundbreaking system-on-chip delivers the unique high-performance features required to address highly virtualized environments, massive data sets and new, emerging workloads.”
There is no word on pricing yet, but it should be competitive with Intel's offerings (the E5-2699A V4 is $4,938). AMD will reportedly be talking data center strategy and its upcoming products during the Open Compute Summit later this week, so hopefully there will be more information released at those presentations.
(My opinions follow)
This is one area where AMD needs to come out strong with support from motherboard manufacturers, system integrators, OEM partners, and OS and software validation to succeed. Intel is not likely to take AMD encroaching on its lucrative server market share lightly, and AMD is going to have a long road ahead of it to regain the market share it once had in this area, but it does have a decent architecture on its hands to build off of with Zen and if it can secure partner support Intel is certainly going to have competition here that it has not had to face in a long time. Intel and AMD competing over the data center market is a good thing, and as both companies bring new technology to market it will trickle down into the consumer level hardware. Naples' success in the data center could mean a profitable AMD with R&D money to push Zen as far as it can – so hopefully they can pull it off.
What are your thoughts on the Naples SoC and AMD's push into the server market?
- Zen and the Art of CPU Design
- AMD Zen Architecture Overview: Focus on Ryzen
- Dissecting AMD Zen Architecture - Interview with David Kanter
Subject: Processors | March 4, 2017 - 06:00 AM | Tim Verry
Tagged: xfr, turbo, sensemi, ryzen, overclocking, amd
Following the leaks and official news and reviews of AMD's Ryzen processors there were a few readers asking for clarity on the eXtended Frequency Range (XFR) technology and whether or not it is enabled on all Ryzen CPUs or only the X models. After quite a bit of digging through forums and contradictory articles, I believe I have the facts in hand to answer those questions. In short, XFR is supported on all Ryzen processors (at least all the Ryzen 7 CPUs released so far) including the non-X Ryzen 7 1700; however the X SKUs get a bigger boost from XFR than the non-X model(s).
Specifically, the Ryzen 7 1700X and Ryzen 7 1800X when paired with a high end air or water cooler is able to boost up to an additional 100 MHz over the 4 GHz advertised boost clock while the Ryzen 7 1700 is limited to an XFR boost of up to 50 MHz so long as there is thermal headroom. Interestingly, the Extended Frequency Range boosts are done in 25 MHz increments (and likely achieved by adjusting the multiplier by 0.25x).
How does this all work though? Well, with Ryzen AMD introduced a new suite of technologies it calls "SenseMI" which, despite the questionable name (heh), puts a lot of intelligence into the processor and builds on the paths AMD started down with Carrizo and Excavator designs. The five main technologies are Pure Power, Precision Boost, Extended Frequency Range (XFR), Neural Net Prediction, and Smart Prefetch. The first three are important when talking about XFR.
With Ryzen AMD has embedded a number of sensors throughout the chip that accurately measure temperatures, clock speeds, and voltages within 1°C, 1mA, 1mW, 1mV and it has connected all the sensors together using its Infinity Fabric. Pure Power lets AMD make localized adaptive adjustments to optimize power usage without negatively affecting performance. Precision Boost is AMD's equivalent to Intel's Turbo Boost and it is built on top of Pure Power's sensor network. Precision Boost enables a Ryzen CPU to dynamically clock up beyond the base clock speed across all cores or clock even further across two cores. Lightly threaded workloads will benefit from the latter while workloads using any more than two threads will get the all core boost, so there is not a lot of granularity in number of cores vs allowed boost but there does not really need to be and the Precision Boost is more granular than Intel's Turbo Boost in clock speed bumps of 25MHz increments versus 100 MHz increments up to the maximum allowed Precision Boost clock. As an example, the Ryzen 7 1800X has a base clock of 3.6 GHz and so long as there is thermal headroom it can adjust the clock speed up by 25 MHz steps to 3.7 GHz across all eight cores or up to as much as 4.0 GHz on two cores.
From there XFR allows the processor to clock beyond the 2 core Precision Boost (XFR only works to increase the boost of the two core turbo not the all core turbo) and as temperatures decrease the allowed XFR increases. While initial reports and slides from AMD suggested XFR would scale with the cooler (air, water, LN2, LHe) with no upper limit aside from temperature and other sensor input, it appears AMD has taken a step back and limited X series Ryzen 7 chips to a maximum XFR boost of 100 MHz over the two core Precision Boost and non-X series Ryzen 7 processors to a maximum XFR boost of 50 MHz over the maximum boosted two core clock speed. The Ryzen 7 1700 will have two extra steps above its two core boost so while the chip has a base clock of 3.0 GHz, Precision Boost can take all eight cores to 3.1GHz or two cores to 3.7 GHz. Further, so long as temperatures are still in check XFR can take those two boosted cores to 3.75 GHz.
XFR will be a setting that you are able to toggle on and off via a motherboard setting, and some motherboards may have the feature turned on by default. Unfortunately, if you choose to manually overclock you will lose XFR functionality (and boost). Further, Precision Boost and XFR are connected and you are not able to turn off one but not the other (you either get both or nothing). Note that if you overclock using AMD's "Ryzen Master" software utility, it will also disable Precision Boost and XFR, but the lower power C-states will stay enabled which may be desirable if you want the power bill and room to cool down when not gaming or creating content.
I would expect as yields and the binning processes improve for Ryzen AMD may lift or extend the XFR limits either with a product refresh (not sure if a micro-code update would be possible) or maybe only in the upcoming hexa-core and quad core Ryzen 5 and Ryzen 3 processors that have less cores and more headroom for overclocking. That is merely speculation however. Ryzen 5 and Ryzen 3 should support XFR on both X and non-X models, but it is too early to know or say what the XFR boost will be.
XFR is neat though not as big of a deal as I originally thought it to be without limits, and as many expected manual overclocking is still going to be the way to go. This is not all bad news though, because it means that the much cheaper Ryzen 7 1700 just got a lot more attractive. You give up a 50 MHz XFR boost that you can't use anyway because you are going to manually overclock and you gamble a bit on getting a decently binned chip that can hit R7 1800X clock speeds, but you save $170 that you can put towards a better motherboard or a better graphics card (or a second one for CrossFire - even on B350).
I am still waiting on our overclocking results as well as Kyle's overclocking results when it comes to the Ryzen 7 1700, but several other sites are reporting success at hitting at least 4.0 GHz (though not many results over 4.0 or 4.1 GHz which isn't unexpected since these are not the highest binned chips and yields are still young so bins are more real/based on silicon and not just for product segmentation but most can hit the higher speeds at x power, v voltage, and n temperature et al). For example, Legit Reviews reports that they were able to hit manually overclock a R7 1700 to 4.0 GHz on all cores at 1.3875 volts. They were able to keep the non-X Ryzen chip stable with those settings on both aftermarket air and AIO water coolers.
AMD's Ryzen Master overclocking software lets you OC and setup CPU and memory profiles from your OS.
More on overclocking: Tom's Hardware has posted that, according to AMD, the safe voltage ceiling for overclocking is 1.35V if you want the CPU to last, but that up to 1.45V CPU voltage is "sustainable". Further, note that is is recommended not to set the SOC Voltage higher than 1.2 volts. Also, much like Intel's platform, it is possible to adjust the base clock (BCLCK) but you may run into stability problems with the rest of the system if you push this too far outside expected specifications (PC Gamer claims you can set this up to 140 MHz though so AM4/Ryzen may be more forgiving in this area than Intel. Edit: The highest figure I've seen so far is 106.4 MHz being stable before the rest of the system gets too far out of spec and becomes unstable. The main benefit to adjusting this is to support overclocked RAM above 3200 MHz so unless you need that your overclocking efforts are probably better spent adjusting the multiplier. /edit). Finally, when manually overclocking you will be able to turn off SMT and/or turn off cores in 2s (e.g. disable 2 cores or disable 4 cores, you can't disable in single numbers but groups of two).
Hopefully this helps to clear up the XFR confusion. If you do not need guaranteed clocks with a bonus XFR boost for a stable workstation build, saving money and going with the Ryzen 7 1700 and manually overclocking it to at least attempt to reach R7 1700X or 1800X speeds seems like the way to go for enthusiasts that are considering making the jump to AM4 especially if you enjoy tinkering with things like overclocking. There's nothing wrong with going with the higher priced and binned chips if you want to go that route, but don't do it for XFR in my opinion.
What are your thoughts? Are you planning to overclock your Ryzen CPU or do you think the Precision Boost and XFR is enough?
Subject: Processors | March 2, 2017 - 03:08 PM | Jeremy Hellstrom
Tagged: Ryzen 1700X, Zen, x370, video, ryzen, amd
Having started your journey with Ryan's quick overview of the performance of the 1800X and anxiously awaiting our further coverage now that we have both the parts and the time to test them you might want to take a peek at some other coverage. [H]ard|OCP tested the processor which many may be looking at due to the more affordable pricing, the Ryzen 1700X. Their test system is based on a Gigabyte A370-Gaming 5 with 16GB of Corsair Vengeance DDR4-3600 which ran at 2933MHz during testing; Kyle reached out to vendors who assured him an update will make 3GHz reachable will arrive soon. Part of their testing focused on VR performance, so make sure to check out the full article.
"Saying that we have waited for a long time for a "real" CPU out of AMD would be a gross misunderstatement, but today AMD looks to remedy that. We are now offered up a new CPU that carries the branding name of Ryzen. Has AMD risen from the CPU graveyard? You be the judge after looking at the data."
Here are some more Processor articles from around the web:
- AMD's Ryzen 7 1800X, Ryzen 7 1700X, and Ryzen 7 1700 CPUs @ The Tech Report
- AMD’s moment of Zen: Finally, an architecture that can compete @ Ars Technica
- AMD Ryzen 7 1800X CPU Review: The Wait is Over @ Modders-Inc
- The AMD Ryzen 7 1800X Performance Review @ Hardware Canucks
- The AMD Ryzen 7 Performance In 3D Rendering & Video Transcoding @ TechARP
- AMD Ryzen 7 1800X @ Kitguru
- AMD Ryzen 7 1800X @ Guru of 3D
- AMD Ryzen 7 1800X, 1700X, and 1700 Processor Review @ OCC
- AMD Ryzen 7 1800X Linux Benchmarks @ Phoronix
Subject: Processors | March 2, 2017 - 11:29 AM | Ryan Shrout
Tagged: amd, ryzen, gaming, 1080p
By far one of the most interesting and concerning points about today's launch of the AMD Ryzen processor is gaming results. Many other reviewers have seen similar results to what I published in my article this morning: gaming at 1080p, even at "ultra" image quality settings, in many top games shows a deficit in performance compared to Intel Kaby Lake and Broadwell-E processors.
I shared my testing result with AMD over a week ago, trying to get answers and hoping to find some instant fix (a BIOS setting, a bug in my firmware). As it turns out, that wasn't the case. To be clear, our testing was done on the ASUS Crosshair VI Hero motherboard with the 5704 BIOS and any reports you see claiming that the deficits only existed on ASUS products are incorrect.
AMD responded to the issues late last night with the following statement from John Taylor, CVP of Marketing:
“As we presented at Ryzen Tech Day, we are supporting 300+ developer kits with game development studios to optimize current and future game releases for the all-new Ryzen CPU. We are on track for 1000+ developer systems in 2017. For example, Bethesda at GDC yesterday announced its strategic relationship with AMD to optimize for Ryzen CPUs, primarily through Vulkan low-level API optimizations, for a new generation of games, DLC and VR experiences.
Oxide Games also provided a public statement today on the significant performance uplift observed when optimizing for the 8-core, 16-thread Ryzen 7 CPU design – optimizations not yet reflected in Ashes of the Singularity benchmarking. Creative Assembly, developers of the Total War series, made a similar statement today related to upcoming Ryzen optimizations.
CPU benchmarking deficits to the competition in certain games at 1080p resolution can be attributed to the development and optimization of the game uniquely to Intel platforms – until now. Even without optimizations in place, Ryzen delivers high, smooth frame rates on all “CPU-bound” games, as well as overall smooth frame rates and great experiences in GPU-bound gaming and VR. With developers taking advantage of Ryzen architecture and the extra cores and threads, we expect benchmarks to only get better, and enable Ryzen excel at next generation gaming experiences as well.
Game performance will be optimized for Ryzen and continue to improve from at-launch frame rate scores.” John Taylor, AMD
The statement begins with Taylor reiterating the momentum of AMD to support developers both from a GPU and a CPU technology angle. Getting hardware in the hands of programmers is the first and most important step to find and fixing any problem areas that Ryzen might have, so this is a great move to see taking place. Both Oxide Games and Creative Assembly, developers of Ashes of the Singularity and Total War respectively, have publicly stated their intent to demonstrate improved threading and performance on Ryzen platforms very soon.
Taylor then recognizes the performance concerns at 1080p with attribution to those deficits going to years of optimizations for Intel processors. It's difficult, if not impossible, to know for sure how much weight this argument has, but it would make some logical sense. Intel CPUs have been the automatic, defacto standard for gaming PCs for many years, and any kind of performance optimizations and development would have been made on those same Intel processors. So it seems plausible that simply by seeding Ryzen to developers and having them look at performance as development goes forward would result in a positive change for AMD's situation.
For buyers today that are gaming at 1080p, the situation is likely to remain as we have presented it going forward. Until games get patched or new games are released from developers that have had access and hands-on time with Ryzen, performance is unlikely to change from some single setting/feature that AMD or its motherboard partners can enable.
The question I would love answered is why is this even happening? What architectural difference between Core and Zen is attributing to this delta? Is it fundamental to the pipeline built or to the caching structure or to how SMT is enabled? Does Windows 10 and its handling of kernel processes have something to do with it? There is a lot to try to figure out as testing moves forward.
If you want to see the statements from both Oxide and Creative Assembly, they are provided below.
“Oxide games is incredibly excited with what we are seeing from the Ryzen CPU. Using our Nitrous game engine, we are working to scale our existing and future game title performance to take full advantage of Ryzen and its 8-core, 16-thread architecture, and the results thus far are impressive. These optimizations are not yet available for Ryzen benchmarking. However, expect updates soon to enhance the performance of games like Ashes of the Singularity on Ryzen CPUs, as well as our future game releases.” - Brad Wardell, CEO Stardock and Oxide
"Creative Assembly is committed to reviewing and optimizing its games on the all-new Ryzen CPU. While current third-party testing doesn’t reflect this yet, our joint optimization program with AMD means that we are looking at options to deliver performance optimization updates in the future to provide better performance on Ryzen CPUs moving forward. " – Creative Assembly, Developers of the Multi-award Winning Total War Series
AMD Ryzen 7 Processor Specifications
It’s finally here and its finally time to talk about. The AMD Ryzen processor is being released onto the world and based on the buildup of excitement over the last week or so since pre-orders began, details on just how Ryzen performs relative to Intel’s mainstream and enthusiast processors are a hot commodity. While leaks have been surfacing for months and details seem to be streaming out from those not bound to the same restrictions we have been, I think you are going to find our analysis of the Ryzen 7 1800X processor to be quite interesting and maybe a little different as well.
Honestly, there isn’t much that has been left to the imagination about Ryzen, its chipsets, pricing, etc. with the slow trickle of information that AMD has been sending out since before CES in January. We know about the specifications, we know about the architecture, we know about the positioning; and while I will definitely recap most of that information here, the real focus is going to be on raw numbers. Benchmarks are what we are targeting with today’s story.
Let’s dive right in.
The Zen Architecture – Foundation for Ryzen
Actually, as it turns out, in typical Josh Walrath fashion, he wrote too much about the AMD Zen architecture to fit into this page. So, instead, you'll find his complete analysis of AMD's new baby right here: AMD Zen Architecture Overview: Focus on Ryzen
AMD Ryzen 7 Processor Specifications
Though we have already detailed the most important specifications for the new AMD Ryzen processors when the preorders went live, its worth touching on them again and reemphasizing the important ones.
|Ryzen 7 1800X||Ryzen 7 1700X||Ryzen 7 1700||Core i7-6900K||Core i7-6800K||Core i7-7700K||Core i5-7600K||Core i7-6700K|
|Architecture||Zen||Zen||Zen||Broadwell-E||Broadwell-E||Kaby Lake||Kaby Lake||Skylake|
|Base Clock||3.6 GHz||3.4 GHz||3.0 GHz||3.2 GHz||3.4 GHz||4.2 GHz||3.8 GHz||4.0 GHz|
|Turbo/Boost Clock||4.0 GHz||3.8 GHz||3.7 GHz||3.7 GHz||3.6 GHz||4.5 GHz||4.2 GHz||4.2 GHz|
|TDP||95 watts||95 watts||65 watts||140 watts||140 watts||91 watts||91 watts||91 watts|
All three of the currently announced Ryzen processors are 8-core, 16-thread designs, matching the Core i7-6900K from Intel in that regard. Though Intel does have a 10-core part branded for consumers, it comes in at a significantly higher price point (over $1500 still). The clock speeds of Ryzen are competitive with the Broadwell-E platform options though are clearly behind the curve when it comes the clock capabilities of Kaby Lake and Skylake. With admittedly lower IPC than Kaby Lake, Zen will struggle in any purely single threaded workload with as much as 500 MHz deficit in clock rate.
- Ryzen 7 1800X - $499 - Amazon.com
- Ryzen 7 1700X - $399 - Amazon.com
- Ryzen 7 1700 - $329 - Amazon.com
- Amazon.com Ryzen Landing Page
- ASUS ROG Crosshair VI Hero - $254 - Amazon.com
- ASUS Prime X370 Pro - $169 - Amazon.com
- ASUS Prime B350-Plus - $99 - Amazon.com
- ASUS Prime B350M-A - $89 - Amazon.com
One interesting deviation from Intel's designs that Ryzen gets is a more granular boost capability. AMD Ryzen CPUs will be able move between processor states in 25 MHz increments while Intel is currently limited to 100 MHz. If implemented correctly and effectively through SenseMI, this allows Ryzen to get 25-75 MHz of additional performance in a scenario where it was too thermally constrainted to hit the next 100 MHz step.
XFR (Extended Frequency Range), supported on the Ryzen 7 1800X and 1700X (hence the "X"), "lifts the maximum Precision Boost frequency beyond ordinary limits in the presence of premium systems and processor cooling." The story goes, that if you have better than average cooling, the 1800X will be able to scale up to 4.1 GHz in some instances for some undetermined amount of time. The better the cooling, the longer it can operate in XFR. While this was originally pitched to us as a game-changing feature that bring extreme advantages to water cooling enthusiasts, it seems it was scaled back for the initial release. Only getting 100 MHz performance increase, in the best case result, seems a bit more like technology for technology's sake rather than offering new capabilities for consumers.
Ryzen integrates a dual channel DDR4 memory controller with speeds up to 2400 MHz, matching what Intel can do on Kaby Lake. Broadwell-E has the advantage with a quad-channel controller but how useful that ends of being will be interesting to see as we step through our performance testing.
One area of interest is the TDP ratings. AMD and Intel have very different views on how this is calculated. Intel has made this the maximum power draw of the processor while AMD sees it as a target for thermal dissipation over time. This means that under stock settings the Core i7-7700K will not draw more than 91 watts and the Core i7-6900K will not draw more than 140 watts. And in our testing, they are well under those ratings most of the time, whenever AVX code is not being operated. AMD’s 95-watt rating on the Ryzen 1800X though will very often be exceed, and our power testing proves that out. The logic is that a cooler with a 95-watt rating and the behavior of thermal propagation give the cooling system time to catch up. (Interestingly, this is the philosophy Intel has taken with its Kaby Lake mobile processors.)
Obviously the most important line here for many of you is the price. The Core i7-6900K is the lowest priced 8C/16T option from Intel for consumers at $1050. The Ryzen R7 1800X has a sticker price less than half of that, at $499. The R7 1700X vs Core i7-6800K match is interesting as well, where the AMD CPU will sell for $399 versus $450 for the 6800K. However, the 6800K only has 6-cores and 12-threads, giving the Ryzen part an instead 25% boost in multi-threaded performance. The 7700K and R7 1700 battle will be interesting as well, with a 4-core difference in capability and a $30 price advantage to AMD.
What Makes Ryzen Tick
We have been exposed to details about the Zen architecture for the past several Hot Chips conventions as well as other points of information directly from AMD. Zen was a clean sheet design that borrowed some of the best features from the Bulldozer and Jaguar architectures, as well as integrating many new ideas that had not been executed in AMD processors before. The fusion of ideas from higher performance cores, lower power cores, and experience gained in APU/GPU design have all come together in a very impressive package that is the Ryzen CPU.
It is well known that AMD brought back Jim Keller to head the CPU group after the slow downward spiral that AMD entered in CPU design. While the Athlon 64 was a tremendous part for the time, the subsequent CPUs being offered by the company did not retain that leadership position. The original Phenom had problems right off the bat and could not compete well with Intel’s latest dual and quad cores. The Phenom II shored up their position a bit, but in the end could not keep pace with the products that Intel continued to introduce with their newly minted “tic-toc” cycle. Bulldozer had issues out of the gate and did not have performance numbers that were significantly greater than the previous generation “Thuban” 6 core Phenom II product, much less the latest Intel Sandy Bridge and Ivy Bridge products that it would compete with.
AMD attempted to stop the bleeding by iterating and evolving the Bulldozer architecture with Piledriver, Steamroller, and Excavator. The final products based on this design arc seemed to do fine for the markets they were aimed at, but certainly did not regain any marketshare with AMD’s shrinking desktop numbers. No matter what AMD did, the base architecture just could not overcome some of the basic properties that impeded strong IPC performance.
The primary goal of this new architecture is to increase IPC to a level consistent to what Intel has to offer. AMD aimed to increase IPC per clock by at least 40% over the previous Excavator core. This is a pretty aggressive goal considering where AMD was with the Bulldozer architecture that was focused on good multi-threaded performance and high clock speeds. AMD claims that it has in fact increased IPC by an impressive 54% from the previous Excavator based core. Not only has AMD seemingly hit its performance goals, but it exceeded them. AMD also plans on using the Zen architecture to power products from mobile products to the highest TDP parts offered.
The Zen Core
The basis for Ryzen are the CCX modules. These modules contain four Zen cores along with 8 MB of shared L3 cache. Each core has 64 KB of L1 I-cache and 32 KB of D-cache. There is a total of 512 KB of L2 cache. These caches are inclusive. The L3 cache acts as a victim cache which partially copies what is in L1 and L2 caches. AMD has improved the performance of their caches to a very large degree as compared to previous architectures. The arrangement here allows the individual cores to quickly snoop any changes in the caches of the others for shared workloads. So if a cache line is changed on one core, other cores requiring that data can quickly snoop into the shared L3 and read it. Doing this allows the CPU doing the actual work to not be interrupted by cache read requests from other cores.
Each core can handle two threads, but unlike Bulldozer has a single integer core. Bulldozer modules featured two integer units and a shared FPU/SIMD. Zen gets rid of CMT for good and we have a single integer and FPU units for each core. The core can address two threads by utilizing AMD’s version of SMT (symmetric multi-threading). There is a primary thread that gets higher priority while the second thread has to wait until resources are freed up. This works far better in the real world than in how I explained it as resources are constantly being shuffled about and the primary thread will not monopolize all resources within the core.
Subject: Processors | March 1, 2017 - 09:17 PM | Tim Verry
Tagged: solder, Ryzen 1700, ryzen, overclocking, IHS, delid, amd
Professional extreme overclocker Roman "der8auer" Hartung from Germany recently managed to successfully de-lid his AMD Ryzen 7 1700 processor and confirmed that AMD is, in fact, using solder as its thermal interface material of choice between the Ryzen die and IHS (integrated heat spreader). The confirmation that AMD is using solder is promising news for enthusiasts eager to overclock the new processors and see just how far they are able to push them on air and water cooling.
Image credit: Roman Hartung. Additional high resolution photos are available here.
In a video on his YouTube channel, der8auer ("The Farmer") shows the steps involved in delidding the Ryzen 7 1700 which involve using razor blades, a heating element to get the IHS heated to a temperature high enough to melt the indium (~170°C on the block with the indium melting around 157°C), and a whole lot of courage. After using the razor blades to cut the glue around the edges, he heated up the IHS enough to start melting the solder and after a cringe-worthy cracking sound he was able to lift the package away from the IHS with the die and on-package components intact!
He does note that the Ryzen using PGA rather than the LGA method Intel has moved to makes the CPU a bit harder to handle as the pins are on the CPU rather than the socket and are easily bent. Compared to the delidding process and possibility of cracking the die or ripping off some components and killing the $329 CPU though, bent pins are nothing and can usually be bent back heh. He reportedly went through two previous Ryzen CPUs before getting a successful de-lid on the third attempt after all.
It seems that AMD is using two small pads of Indium solder along with some gold plating on the inside of the IHS to facilitate heat transfer and allow the solder to mate with the IHS. Because AMD is using what seems to be high quality solder TIM, delidding and replacing the TIM does not seem to be necessary at all; however, Roman "der8auer" Hartung speculates that direct die cooling could work out very well for those enthusiasts brave enough to try it since the cooler does not need to put high amounts of pressure onto the CPU to hold it into place unlike an LGA socket.
If you are interested in seeing the overclocking benefits of de-lidding and direct die cooling a Ryzen CPU, keep an eye on his YouTube channel for a video over the weekend detailing his testing using a Ryzen 7 1800X.
I am really looking forward to seeing how far enthusiasts are able to push Ryzen (especially on water), and maybe we can convince Morry to de-lid a Ryzen CPU!
- Overclockers Push Ryzen 7 1800X to 5.2 GHz On LN2, Break Cinebench Record
- Delidding your Intel Haswell CPU @ PC Perspective (Morry Teitelman)
- Photos and Tests of Skylake (Intel Core i7-6700K) Delidded
- Intel Haswell-E De-Lidded: Solder Is Its Thermal Interface