Tweaks for days
It seems like it’s been months since AMD launched Ryzen, its first new processor architecture in about a decade, when in fact we are only four weeks removed. One of the few concerns about the Ryzen processors centered on its performance in some gaming performance results, particularly in common resolutions like 1080p. While I was far from the only person to notice these concerns, our gaming tests clearly showed a gap between the Ryzen 7 1800X and the Intel Core i7-7700K and 6900K processors in Civilization 6, Hitman and Rise of the Tomb Raider.
A graph from our Ryzen launch coverage...
We had been working with AMD for a couple of weeks on the Ryzen launch and fed back our results with questions in the week before launch. On March 2nd, AMD’s CVP of Marketing John Taylor gave us a prepared statement that acknowledged the issue but promised changes come in form for game engine updates. These software updates would need to be implemented by the game developers themselves in order to take advantage of the unique and more complex core designs of the Zen architecture. We had quotes from the developers of Ashes of the Singularity as well as the Total War series to back it up.
And while statements promising change are nice, it really takes some proof to get the often skeptical tech media and tech enthusiasts to believe that change can actually happen. Today AMD is showing its first result.
The result of 400 developer hours of work, the Nitrous Engine powering Ashes of the Singularity received an update today to version 26118 that integrates updates to threading to better balance the performance across Ryzen 7’s 8 cores and 16 threads. I was able to do some early testing on the new revision, as well as with the previous retail shipping version (25624) to see what kind of improvements the patch brings with it.
Stardock / Oxide CEO Brad Wardell had this to say in a press release:
“I’ve always been vocal about taking advantage of every ounce of performance the PC has to offer. That’s why I’m a strong proponent of DirectX 12 and Vulkan® because of the way these APIs allow us to access multiple CPU cores, and that’s why the AMD Ryzen processor has so much potential,” said Stardock and Oxide CEO Brad Wardell. “As good as AMD Ryzen is right now – and it’s remarkably fast – we’ve already seen that we can tweak games like Ashes of the Singularity to take even more advantage of its impressive core count and processing power. AMD Ryzen brings resources to the table that will change what people will come to expect from a PC gaming experience.”
Our testing setup is in line with our previous CPU performance stories.
|Test System Setup|
|CPU||AMD Ryzen 7 1800X
Intel Core i7-6900K
|Motherboard||ASUS Crosshair VI Hero (Ryzen)
ASUS X99-Deluxe II (Broadwell-E)
|Storage||Corsair Force GS 240 SSD|
|Graphics Card||NVIDIA GeForce GTX 1080 8GB|
|Graphics Drivers||NVIDIA 378.49|
|Power Supply||Corsair HX1000|
|Operating System||Windows 10 Pro x64|
I was using the latest BIOS for our ASUS Crosshair VI Hero motherboard (1002) and upgraded to some Geil RGB (!!) memory capable of running at 3200 MHz on this board with a single BIOS setting adjustment. All of my tests were done at 1080p in order to return to the pain point that AMD was dealing with on launch day.
Let’s see the results.
These are substantial performance improvements with the new engine code! At both 2400 MHz and 3200 MHz memory speeds, and at both High and Extreme presets in the game (all running in DX12 for what that’s worth), the gaming performance on the GPU-centric is improved. At the High preset (which is the setting that AMD used in its performance data for the press release), we see a 31% jump in performance when running at the higher memory speed and a 22% improvement with the lower speed memory. Even when running at the more GPU-bottlenecked state of the Extreme preset, that performance improvement for the Ryzen processors with the latest Ashes patch is 17-20%!
It’s also important to note that Intel performance is unaffected – either for the better or worse. Whatever work Oxide did to improve the engine for AMD’s Ryzen processors had NO impact on the Core processors, which is interesting to say the least. The cynic in me would believe there is little chance that any agnostic changes to code would raise Intel’s multi-core performance at least a little bit.
So what exactly is happening to the engine with v26118? I haven’t had a chance to have an in-depth conversation with anyone at AMD or Oxide yet on the subject, but at a high level, I was told that this is what happens when instructions and sequences are analyzed for an architecture specifically. “For basically 5 years”, I was told, Oxide and other developers have dedicated their time to “instruction traces and analysis to maximize Intel performance” which helps to eliminate poor instruction setup. After spending some time with Ryzen and the necessary debug tools (and some AMD engineers), they were able to improve performance on Ryzen without adversely affecting Intel parts.
Core to core latency testing on Ryzen 7 1800X
I am hoping to get more specific detail in the coming days, but it would seem very likely that Oxide was able to properly handle the more complex core to core communication systems on Ryzen and its CCX implementation. We demonstrated early this month how thread to thread communication across core complexes causes substantially latency penalties, and that a developer that intelligently manages threads that have dependencies on the core complex can improve overall performance. I would expect this is at least part of the solution Oxide was able to integrate (and would also explain why Intel parts are unaffected).
- Ryzen 7 1800X - $499 - Amazon.com
- Ryzen 7 1700X - $399 - Amazon.com
- Ryzen 7 1700 - $329 - Amazon.com
What is important now is that AMD takes this momentum with Ashes of the Singularity and actually does something with it. Many of you will recognize Ashes as the flagship title for Mantle when AMD made that move to change the programming habits and models for developers, and though Mantle would eventually become Vulkan and drive DX12 development, it did not foretell an overall shift as it hoped to. Can AMD and its developer relations team continue to make the case that spending time and money (which is what 400 developer hours equates to) to make specific performance enhancements for Ryzen processors is in the best interest of everyone? We’ll soon find out.
AMD Ryzen 7 Processor Specifications
It’s finally here and its finally time to talk about. The AMD Ryzen processor is being released onto the world and based on the buildup of excitement over the last week or so since pre-orders began, details on just how Ryzen performs relative to Intel’s mainstream and enthusiast processors are a hot commodity. While leaks have been surfacing for months and details seem to be streaming out from those not bound to the same restrictions we have been, I think you are going to find our analysis of the Ryzen 7 1800X processor to be quite interesting and maybe a little different as well.
Honestly, there isn’t much that has been left to the imagination about Ryzen, its chipsets, pricing, etc. with the slow trickle of information that AMD has been sending out since before CES in January. We know about the specifications, we know about the architecture, we know about the positioning; and while I will definitely recap most of that information here, the real focus is going to be on raw numbers. Benchmarks are what we are targeting with today’s story.
Let’s dive right in.
The Zen Architecture – Foundation for Ryzen
Actually, as it turns out, in typical Josh Walrath fashion, he wrote too much about the AMD Zen architecture to fit into this page. So, instead, you'll find his complete analysis of AMD's new baby right here: AMD Zen Architecture Overview: Focus on Ryzen
AMD Ryzen 7 Processor Specifications
Though we have already detailed the most important specifications for the new AMD Ryzen processors when the preorders went live, its worth touching on them again and reemphasizing the important ones.
|Ryzen 7 1800X||Ryzen 7 1700X||Ryzen 7 1700||Core i7-6900K||Core i7-6800K||Core i7-7700K||Core i5-7600K||Core i7-6700K|
|Architecture||Zen||Zen||Zen||Broadwell-E||Broadwell-E||Kaby Lake||Kaby Lake||Skylake|
|Base Clock||3.6 GHz||3.4 GHz||3.0 GHz||3.2 GHz||3.4 GHz||4.2 GHz||3.8 GHz||4.0 GHz|
|Turbo/Boost Clock||4.0 GHz||3.8 GHz||3.7 GHz||3.7 GHz||3.6 GHz||4.5 GHz||4.2 GHz||4.2 GHz|
|TDP||95 watts||95 watts||65 watts||140 watts||140 watts||91 watts||91 watts||91 watts|
All three of the currently announced Ryzen processors are 8-core, 16-thread designs, matching the Core i7-6900K from Intel in that regard. Though Intel does have a 10-core part branded for consumers, it comes in at a significantly higher price point (over $1500 still). The clock speeds of Ryzen are competitive with the Broadwell-E platform options though are clearly behind the curve when it comes the clock capabilities of Kaby Lake and Skylake. With admittedly lower IPC than Kaby Lake, Zen will struggle in any purely single threaded workload with as much as 500 MHz deficit in clock rate.
- Ryzen 7 1800X - $499 - Amazon.com
- Ryzen 7 1700X - $399 - Amazon.com
- Ryzen 7 1700 - $329 - Amazon.com
- Amazon.com Ryzen Landing Page
- ASUS ROG Crosshair VI Hero - $254 - Amazon.com
- ASUS Prime X370 Pro - $169 - Amazon.com
- ASUS Prime B350-Plus - $99 - Amazon.com
- ASUS Prime B350M-A - $89 - Amazon.com
One interesting deviation from Intel's designs that Ryzen gets is a more granular boost capability. AMD Ryzen CPUs will be able move between processor states in 25 MHz increments while Intel is currently limited to 100 MHz. If implemented correctly and effectively through SenseMI, this allows Ryzen to get 25-75 MHz of additional performance in a scenario where it was too thermally constrainted to hit the next 100 MHz step.
XFR (Extended Frequency Range), supported on the Ryzen 7 1800X and 1700X (hence the "X"), "lifts the maximum Precision Boost frequency beyond ordinary limits in the presence of premium systems and processor cooling." The story goes, that if you have better than average cooling, the 1800X will be able to scale up to 4.1 GHz in some instances for some undetermined amount of time. The better the cooling, the longer it can operate in XFR. While this was originally pitched to us as a game-changing feature that bring extreme advantages to water cooling enthusiasts, it seems it was scaled back for the initial release. Only getting 100 MHz performance increase, in the best case result, seems a bit more like technology for technology's sake rather than offering new capabilities for consumers.
Ryzen integrates a dual channel DDR4 memory controller with speeds up to 2400 MHz, matching what Intel can do on Kaby Lake. Broadwell-E has the advantage with a quad-channel controller but how useful that ends of being will be interesting to see as we step through our performance testing.
One area of interest is the TDP ratings. AMD and Intel have very different views on how this is calculated. Intel has made this the maximum power draw of the processor while AMD sees it as a target for thermal dissipation over time. This means that under stock settings the Core i7-7700K will not draw more than 91 watts and the Core i7-6900K will not draw more than 140 watts. And in our testing, they are well under those ratings most of the time, whenever AVX code is not being operated. AMD’s 95-watt rating on the Ryzen 1800X though will very often be exceed, and our power testing proves that out. The logic is that a cooler with a 95-watt rating and the behavior of thermal propagation give the cooling system time to catch up. (Interestingly, this is the philosophy Intel has taken with its Kaby Lake mobile processors.)
Obviously the most important line here for many of you is the price. The Core i7-6900K is the lowest priced 8C/16T option from Intel for consumers at $1050. The Ryzen R7 1800X has a sticker price less than half of that, at $499. The R7 1700X vs Core i7-6800K match is interesting as well, where the AMD CPU will sell for $399 versus $450 for the 6800K. However, the 6800K only has 6-cores and 12-threads, giving the Ryzen part an instead 25% boost in multi-threaded performance. The 7700K and R7 1700 battle will be interesting as well, with a 4-core difference in capability and a $30 price advantage to AMD.
What Makes Ryzen Tick
We have been exposed to details about the Zen architecture for the past several Hot Chips conventions as well as other points of information directly from AMD. Zen was a clean sheet design that borrowed some of the best features from the Bulldozer and Jaguar architectures, as well as integrating many new ideas that had not been executed in AMD processors before. The fusion of ideas from higher performance cores, lower power cores, and experience gained in APU/GPU design have all come together in a very impressive package that is the Ryzen CPU.
It is well known that AMD brought back Jim Keller to head the CPU group after the slow downward spiral that AMD entered in CPU design. While the Athlon 64 was a tremendous part for the time, the subsequent CPUs being offered by the company did not retain that leadership position. The original Phenom had problems right off the bat and could not compete well with Intel’s latest dual and quad cores. The Phenom II shored up their position a bit, but in the end could not keep pace with the products that Intel continued to introduce with their newly minted “tic-toc” cycle. Bulldozer had issues out of the gate and did not have performance numbers that were significantly greater than the previous generation “Thuban” 6 core Phenom II product, much less the latest Intel Sandy Bridge and Ivy Bridge products that it would compete with.
AMD attempted to stop the bleeding by iterating and evolving the Bulldozer architecture with Piledriver, Steamroller, and Excavator. The final products based on this design arc seemed to do fine for the markets they were aimed at, but certainly did not regain any marketshare with AMD’s shrinking desktop numbers. No matter what AMD did, the base architecture just could not overcome some of the basic properties that impeded strong IPC performance.
The primary goal of this new architecture is to increase IPC to a level consistent to what Intel has to offer. AMD aimed to increase IPC per clock by at least 40% over the previous Excavator core. This is a pretty aggressive goal considering where AMD was with the Bulldozer architecture that was focused on good multi-threaded performance and high clock speeds. AMD claims that it has in fact increased IPC by an impressive 54% from the previous Excavator based core. Not only has AMD seemingly hit its performance goals, but it exceeded them. AMD also plans on using the Zen architecture to power products from mobile products to the highest TDP parts offered.
The Zen Core
The basis for Ryzen are the CCX modules. These modules contain four Zen cores along with 8 MB of shared L3 cache. Each core has 64 KB of L1 I-cache and 32 KB of D-cache. There is a total of 512 KB of L2 cache. These caches are inclusive. The L3 cache acts as a victim cache which partially copies what is in L1 and L2 caches. AMD has improved the performance of their caches to a very large degree as compared to previous architectures. The arrangement here allows the individual cores to quickly snoop any changes in the caches of the others for shared workloads. So if a cache line is changed on one core, other cores requiring that data can quickly snoop into the shared L3 and read it. Doing this allows the CPU doing the actual work to not be interrupted by cache read requests from other cores.
Each core can handle two threads, but unlike Bulldozer has a single integer core. Bulldozer modules featured two integer units and a shared FPU/SIMD. Zen gets rid of CMT for good and we have a single integer and FPU units for each core. The core can address two threads by utilizing AMD’s version of SMT (symmetric multi-threading). There is a primary thread that gets higher priority while the second thread has to wait until resources are freed up. This works far better in the real world than in how I explained it as resources are constantly being shuffled about and the primary thread will not monopolize all resources within the core.