Author:
Manufacturer: AMD

We are up to two...

UPDATE (5/31/2017): Crystal Dynamics was able to get back to us with a couple of points on the changes that were made with this patch to affect the performance of AMD Ryzen processors.

  1. Rise of the Tomb Raider splits rendering tasks to run on different threads. By tuning the size of those tasks – breaking some up, allowing multicore CPUs to contribute in more cases, and combining some others, to reduce overheads in the scheduler – the game can more efficiently exploit extra threads on the host CPU.
     
  2. An optimization was identified in texture management that improves the combination of AMD CPU and NVIDIA GPU.  Overhead was reduced by packing texture descriptor uploads into larger chunks.

There you have it, a bit more detail on the software changes made to help adapt the game engine to AMD's Ryzen architecture. Not only that, but it does confirm our information that there was slightly MORE to address in the Ryzen+GeForce combinations.

END UPDATE

Despite a couple of growing pains out of the gate, the Ryzen processor launch appears to have been a success for AMD. Both the Ryzen 7 and the Ryzen 5 releases proved to be very competitive with Intel’s dominant CPUs in the market and took significant leads in areas of massive multi-threading and performance per dollar. An area that AMD has struggled in though has been 1080p gaming – performance in those instances on both Ryzen 7 and 5 processors fell behind comparable Intel parts by (sometimes) significant margins.

Our team continues to watch the story to see how AMD and game developers work through the issue. Most recently I posted a look at the memory latency differences between Ryzen and Intel Core processors. As it turns out, the memory latency differences are a significant part of the initial problem for AMD:

Because of this, I think it is fair to claim that some, if not most, of the 1080p gaming performance deficits we have seen with AMD Ryzen processors are a result of this particular memory system intricacy. You can combine memory latency with the thread-to-thread communication issue we discussed previously into one overall system level complication: the Zen memory system behaves differently than anything we have seen prior and it currently suffers in a couple of specific areas because of it.

In that story I detailed our coverage of the Ryzen processor and its gaming performance succinctly:

Our team has done quite a bit of research and testing on this topic. This included a detailed look at the first asserted reason for the performance gap, the Windows 10 scheduler. Our summary there was that the scheduler was working as expected and that minimal difference was seen when moving between different power modes. We also talked directly with AMD to find out its then current stance on the results, backing up our claims on the scheduler and presented a better outlook for gaming going forward. When AMD wanted to test a new custom Windows 10 power profile to help improve performance in some cases, we took part in that too. In late March we saw the first gaming performance update occur courtesy of Ashes of the Singularity: Escalation where an engine update to utilize more threads resulted in as much as 31% average frame increase.

Quick on the heels of the Ryzen 7 release, AMD worked with the developer Oxide on the Ashes of the Singularity: Escalation engine. Through tweaks and optimizations, the game was able to showcase as much as a 30% increase in average frame rate on the integrated benchmark. While this was only a single use case, it does prove that through work with the developers, AMD has the ability to improve the 1080p gaming positioning of Ryzen against Intel.

rotr-screen4-small.jpg

Fast forward to today and I was surprised to find a new patch for Rise of the Tomb Raider, a game that was actually one of the worst case scenarios for AMD with Ryzen. (Patch #12, v1.0.770.1) The patch notes mention the following:

The following changes are included in this patch

- Fix certain DX12 crashes reported by users on the forums.

- Improve DX12 performance across a variety of hardware, in CPU bound situations. Especially performance on AMD Ryzen CPUs can be significantly improved.

While we expect this patch to be an improvement for everyone, if you do have trouble with this patch and prefer to stay on the old version we made a Beta available on Steam, build 767.2, which can be used to switch back to the previous version.

We will keep monitoring for feedback and will release further patches as it seems required. We always welcome your feedback!

Obviously the data point that stood out for me was the improved DX12 performance “in CPU bound situations. Especially on AMD Ryzen CPUs…”

Remember how the situation appeared in April?

rotr.png

The Ryzen 7 1800X was 24% slower than the Intel Core i7-7700K – a dramatic difference for a processor that should only have been ~8-10% slower in single threaded workloads.

How does this new patch to RoTR affect performance? We tested it on the same Ryzen 7 1800X benchmarks platform from previous testing including the ASUS Crosshair VI Hero motherboard, 16GB DDR4-2400 memory and GeForce GTX 1080 Founders Edition using the 378.78 driver. All testing was done under the DX12 code path.

tr-1.png

tr-2.png

The Ryzen 7 1800X score jumps from 107 FPS to 126.44 FPS, an increase of 17%! That is a significant boost in performance at 1080p while still running at the Very High image quality preset, indicating that the developer (and likely AMD) were able to find substantial inefficiencies in the engine. For comparison, the 8-core / 16-thread Intel Core i7-6900K only sees a 2.4% increase from this new game revision. This tells us that the changes to the game were specific to Ryzen processors and their design, but that no performance was redacted from the Intel platforms.

Continue reading our look at the new Rise of the Tomb Raider patch for Ryzen!

Author:
Subject: Processors
Manufacturer: Various

Application Profiling Tells the Story

It should come as no surprise to anyone that has been paying attention the last two months that the latest AMD Ryzen processors and architecture are getting a lot of attention. Ryzen 7 launched with a $499 part that bested the Intel $1000 CPU at heavily threaded applications and Ryzen 5 launched with great value as well, positioning a 6-core/12-thread CPU against quad-core parts from the competition. But part of the story that permeated through both the Ryzen 7 and the Ryzen 5 processor launches was the situation surrounding gaming performance, in particular 1080p gaming, and the surprising delta  that we see in some games.

Our team has done quite a bit of research and testing on this topic. This included a detailed look at the first asserted reason for the performance gap, the Windows 10 scheduler. Our summary there was that the scheduler was working as expected and that minimal difference was seen when moving between different power modes. We also talked directly with AMD to find out its then current stance on the results, backing up our claims on the scheduler and presented a better outlook for gaming going forward. When AMD wanted to test a new custom Windows 10 power profile to help improve performance in some cases, we took part in that too. In late March we saw the first gaming performance update occur courtesy of Ashes of the Singularity: Escalation where an engine update to utilize more threads resulted in as much as 31% average frame increase.

ping-amd.png

As a part of that dissection of the Windows 10 scheduler story, we also discovered interesting data about the CCX construction and how the two modules on the 1800X communicated. The result was significantly longer thread to thread latencies than we had seen in any platform before and it was because of the fabric implementation that AMD integrated with the Zen architecture.

This has led me down another hole recently, wondering if we could further compartmentalize the gaming performance of the Ryzen processors using memory latency. As I showed in my Ryzen 5 review, memory frequency and throughput directly correlates to gaming performance improvements, in the order of 14% in some cases. But what about looking solely at memory latency alone?

Continue reading our analysis of memory latency, 1080p gaming, and how it impacts Ryzen!!

Author:
Subject: Processors
Manufacturer: AMD

Tweaks for days

It seems like it’s been months since AMD launched Ryzen, its first new processor architecture in about a decade, when in fact we are only four weeks removed. One of the few concerns about the Ryzen processors centered on its performance in some gaming performance results, particularly in common resolutions like 1080p. While I was far from the only person to notice these concerns, our gaming tests clearly showed a gap between the Ryzen 7 1800X and the Intel Core i7-7700K and 6900K processors in Civilization 6, Hitman and Rise of the Tomb Raider.

hitman.png

A graph from our Ryzen launch coverage...

We had been working with AMD for a couple of weeks on the Ryzen launch and fed back our results with questions in the week before launch. On March 2nd, AMD’s CVP of Marketing John Taylor gave us a prepared statement that acknowledged the issue but promised changes come in form for game engine updates. These software updates would need to be implemented by the game developers themselves in order to take advantage of the unique and more complex core designs of the Zen architecture. We had quotes from the developers of Ashes of the Singularity as well as the Total War series to back it up.

And while statements promising change are nice, it really takes some proof to get the often skeptical tech media and tech enthusiasts to believe that change can actually happen. Today AMD is showing its first result.

The result of 400 developer hours of work, the Nitrous Engine powering Ashes of the Singularity received an update today to version 26118 that integrates updates to threading to better balance the performance across Ryzen 7’s 8 cores and 16 threads. I was able to do some early testing on the new revision, as well as with the previous retail shipping version (25624) to see what kind of improvements the patch brings with it.

Stardock / Oxide CEO Brad Wardell had this to say in a press release:

“I’ve always been vocal about taking advantage of every ounce of performance the PC has to offer. That’s why I’m a strong proponent of DirectX 12 and Vulkan® because of the way these APIs allow us to access multiple CPU cores, and that’s why the AMD Ryzen processor has so much potential,” said Stardock and Oxide CEO Brad Wardell. “As good as AMD Ryzen is right now – and it’s remarkably fast – we’ve already seen that we can tweak games like Ashes of the Singularity to take even more advantage of its impressive core count and processing power. AMD Ryzen brings resources to the table that will change what people will come to expect from a PC gaming experience.”

Our testing setup is in line with our previous CPU performance stories.

Test System Setup
CPU AMD Ryzen 7 1800X
Intel Core i7-6900K
Motherboard ASUS Crosshair VI Hero (Ryzen)
ASUS X99-Deluxe II (Broadwell-E)
Memory 16GB DDR4-2400
Storage Corsair Force GS 240 SSD
Sound Card On-board
Graphics Card NVIDIA GeForce GTX 1080 8GB
Graphics Drivers NVIDIA 378.49
Power Supply Corsair HX1000
Operating System Windows 10 Pro x64

I was using the latest BIOS for our ASUS Crosshair VI Hero motherboard (1002) and upgraded to some Geil RGB (!!) memory capable of running at 3200 MHz on this board with a single BIOS setting adjustment. All of my tests were done at 1080p in order to return to the pain point that AMD was dealing with on launch day.

Let’s see the results.

ashes-1.png

ashes-2.png

These are substantial performance improvements with the new engine code! At both 2400 MHz and 3200 MHz memory speeds, and at both High and Extreme presets in the game (all running in DX12 for what that’s worth), the gaming performance on the GPU-centric is improved. At the High preset (which is the setting that AMD used in its performance data for the press release), we see a 31% jump in performance when running at the higher memory speed and a 22% improvement with the lower speed memory. Even when running at the more GPU-bottlenecked state of the Extreme preset, that performance improvement for the Ryzen processors with the latest Ashes patch is 17-20%!

DSC02636.jpg

It’s also important to note that Intel performance is unaffected – either for the better or worse. Whatever work Oxide did to improve the engine for AMD’s Ryzen processors had NO impact on the Core processors, which is interesting to say the least. The cynic in me would believe there is little chance that any agnostic changes to code would raise Intel’s multi-core performance at least a little bit.

So what exactly is happening to the engine with v26118? I haven’t had a chance to have an in-depth conversation with anyone at AMD or Oxide yet on the subject, but at a high level, I was told that this is what happens when instructions and sequences are analyzed for an architecture specifically. “For basically 5 years”, I was told, Oxide and other developers have dedicated their time to “instruction traces and analysis to maximize Intel performance” which helps to eliminate poor instruction setup. After spending some time with Ryzen and the necessary debug tools (and some AMD engineers), they were able to improve performance on Ryzen without adversely affecting Intel parts.

ping-amd.png

Core to core latency testing on Ryzen 7 1800X

I am hoping to get more specific detail in the coming days, but it would seem very likely that Oxide was able to properly handle the more complex core to core communication systems on Ryzen and its CCX implementation. We demonstrated early this month how thread to thread communication across core complexes causes substantially latency penalties, and that a developer that intelligently manages threads that have dependencies on the core complex can improve overall performance. I would expect this is at least part of the solution Oxide was able to integrate (and would also explain why Intel parts are unaffected).

What is important now is that AMD takes this momentum with Ashes of the Singularity and actually does something with it. Many of you will recognize Ashes as the flagship title for Mantle when AMD made that move to change the programming habits and models for developers, and though Mantle would eventually become Vulkan and drive DX12 development, it did not foretell an overall shift as it hoped to. Can AMD and its developer relations team continue to make the case that spending time and money (which is what 400 developer hours equates to) to make specific performance enhancements for Ryzen processors is in the best interest of everyone? We’ll soon find out.

Author:
Subject: Processors
Manufacturer: AMD

AMD Ryzen 7 Processor Specifications

It’s finally here and its finally time to talk about. The AMD Ryzen processor is being released onto the world and based on the buildup of excitement over the last week or so since pre-orders began, details on just how Ryzen performs relative to Intel’s mainstream and enthusiast processors are a hot commodity. While leaks have been surfacing for months and details seem to be streaming out from those not bound to the same restrictions we have been, I think you are going to find our analysis of the Ryzen 7 1800X processor to be quite interesting and maybe a little different as well.

Honestly, there isn’t much that has been left to the imagination about Ryzen, its chipsets, pricing, etc. with the slow trickle of information that AMD has been sending out since before CES in January. We know about the specifications, we know about the architecture, we know about the positioning; and while I will definitely recap most of that information here, the real focus is going to be on raw numbers. Benchmarks are what we are targeting with today’s story.

Let’s dive right in.

The Zen Architecture – Foundation for Ryzen

Actually, as it turns out, in typical Josh Walrath fashion, he wrote too much about the AMD Zen architecture to fit into this page. So, instead, you'll find his complete analysis of AMD's new baby right here: AMD Zen Architecture Overview: Focus on Ryzen

ccx.png

AMD Ryzen 7 Processor Specifications

Though we have already detailed the most important specifications for the new AMD Ryzen processors when the preorders went live, its worth touching on them again and reemphasizing the important ones.

  Ryzen 7 1800X Ryzen 7 1700X Ryzen 7 1700 Core i7-6900K Core i7-6800K Core i7-7700K Core i5-7600K Core i7-6700K
Architecture Zen Zen Zen Broadwell-E Broadwell-E Kaby Lake Kaby Lake Skylake
Process Tech 14nm 14nm 14nm 14nm 14nm 14nm+ 14nm+ 14nm
Cores/Threads 8/16 8/16 8/16 8/16 6/12 4/8 4/4 4/8
Base Clock 3.6 GHz 3.4 GHz 3.0 GHz 3.2 GHz 3.4 GHz 4.2 GHz 3.8 GHz 4.0 GHz
Turbo/Boost Clock 4.0 GHz 3.8  GHz 3.7 GHz 3.7 GHz 3.6 GHz 4.5 GHz 4.2 GHz 4.2 GHz
Cache 20MB 20MB 20MB 20MB 15MB 8MB 8MB 8MB
Memory Support DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Quad Channel
DDR4-2400
Quad Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
TDP 95 watts 95 watts 65 watts 140 watts 140 watts 91 watts 91 watts 91 watts
Price $499 $399 $329 $1050 $450 $350 $239 $309

All three of the currently announced Ryzen processors are 8-core, 16-thread designs, matching the Core i7-6900K from Intel in that regard. Though Intel does have a 10-core part branded for consumers, it comes in at a significantly higher price point (over $1500 still). The clock speeds of Ryzen are competitive with the Broadwell-E platform options though are clearly behind the curve when it comes the clock capabilities of Kaby Lake and Skylake. With admittedly lower IPC than Kaby Lake, Zen will struggle in any purely single threaded workload with as much as 500 MHz deficit in clock rate.

One interesting deviation from Intel's designs that Ryzen gets is a more granular boost capability. AMD Ryzen CPUs will be able move between processor states in 25 MHz increments while Intel is currently limited to 100 MHz. If implemented correctly and effectively through SenseMI, this allows Ryzen to get 25-75 MHz of additional performance in a scenario where it was too thermally constrainted to hit the next 100 MHz step. 

DSC02636.jpg

XFR (Extended Frequency Range), supported on the Ryzen 7 1800X and 1700X (hence the "X"), "lifts the maximum Precision Boost frequency beyond ordinary limits in the presence of premium systems and processor cooling." The story goes, that if you have better than average cooling, the 1800X will be able to scale up to 4.1 GHz in some instances for some undetermined amount of time. The better the cooling, the longer it can operate in XFR. While this was originally pitched to us as a game-changing feature that bring extreme advantages to water cooling enthusiasts, it seems it was scaled back for the initial release. Only getting 100 MHz performance increase, in the best case result, seems a bit more like technology for technology's sake rather than offering new capabilities for consumers.

cpu2.jpg

Ryzen integrates a dual channel DDR4 memory controller with speeds up to 2400 MHz, matching what Intel can do on Kaby Lake. Broadwell-E has the advantage with a quad-channel controller but how useful that ends of being will be interesting to see as we step through our performance testing.

One area of interest is the TDP ratings. AMD and Intel have very different views on how this is calculated. Intel has made this the maximum power draw of the processor while AMD sees it as a target for thermal dissipation over time. This means that under stock settings the Core i7-7700K will not draw more than 91 watts and the Core i7-6900K will not draw more than 140 watts. And in our testing, they are well under those ratings most of the time, whenever AVX code is not being operated. AMD’s 95-watt rating on the Ryzen 1800X though will very often be exceed, and our power testing proves that out. The logic is that a cooler with a 95-watt rating and the behavior of thermal propagation give the cooling system time to catch up. (Interestingly, this is the philosophy Intel has taken with its Kaby Lake mobile processors.)

lisa-29.jpg

Obviously the most important line here for many of you is the price. The Core i7-6900K is the lowest priced 8C/16T option from Intel for consumers at $1050. The Ryzen R7 1800X has a sticker price less than half of that, at $499. The R7 1700X vs Core i7-6800K match is interesting as well, where the AMD CPU will sell for $399 versus $450 for the 6800K. However, the 6800K only has 6-cores and 12-threads, giving the Ryzen part an instead 25% boost in multi-threaded performance. The 7700K and R7 1700 battle will be interesting as well, with a 4-core difference in capability and a $30 price advantage to AMD.

Continue reading our review of the new AMD Ryzen 7 1800X processor!!

Author:
Subject: Processors
Manufacturer: AMD

What Makes Ryzen Tick

We have been exposed to details about the Zen architecture for the past several Hot Chips conventions as well as other points of information directly from AMD.  Zen was a clean sheet design that borrowed some of the best features from the Bulldozer and Jaguar architectures, as well as integrating many new ideas that had not been executed in AMD processors before.  The fusion of ideas from higher performance cores, lower power cores, and experience gained in APU/GPU design have all come together in a very impressive package that is the Ryzen CPU.

zen_01.jpg

It is well known that AMD brought back Jim Keller to head the CPU group after the slow downward spiral that AMD entered in CPU design.  While the Athlon 64 was a tremendous part for the time, the subsequent CPUs being offered by the company did not retain that leadership position.  The original Phenom had problems right off the bat and could not compete well with Intel’s latest dual and quad cores.  The Phenom II shored up their position a bit, but in the end could not keep pace with the products that Intel continued to introduce with their newly minted “tic-toc” cycle.  Bulldozer had issues  out of the gate and did not have performance numbers that were significantly greater than the previous generation “Thuban” 6 core Phenom II product, much less the latest Intel Sandy Bridge and Ivy Bridge products that it would compete with.

AMD attempted to stop the bleeding by iterating and evolving the Bulldozer architecture with Piledriver, Steamroller, and Excavator.  The final products based on this design arc seemed to do fine for the markets they were aimed at, but certainly did not regain any marketshare with AMD’s shrinking desktop numbers.  No matter what AMD did, the base architecture just could not overcome some of the basic properties that impeded strong IPC performance.

52_perc_design_opt.png

The primary goal of this new architecture is to increase IPC to a level consistent to what Intel has to offer.  AMD aimed to increase IPC per clock by at least 40% over the previous Excavator core.  This is a pretty aggressive goal considering where AMD was with the Bulldozer architecture that was focused on good multi-threaded performance and high clock speeds.  AMD claims that it has in fact increased IPC by an impressive 54% from the previous Excavator based core.  Not only has AMD seemingly hit its performance goals, but it exceeded them.  AMD also plans on using the Zen architecture to power products from mobile products to the highest TDP parts offered.

 

The Zen Core

The basis for Ryzen are the CCX modules.  These modules contain four Zen cores along with 8 MB of shared L3 cache.  Each core has 64 KB of L1 I-cache and 32 KB of D-cache.  There is a total of 512 KB of L2 cache.  These caches are inclusive.  The L3 cache acts as a victim cache which partially copies what is in L1 and L2 caches.  AMD has improved the performance of their caches to a very large degree as compared to previous architectures.  The arrangement here allows the individual cores to quickly snoop any changes in the caches of the others for shared workloads.  So if a cache line is changed on one core, other cores requiring that data can quickly snoop into the shared L3 and read it.  Doing this allows the CPU doing the actual work to not be interrupted by cache read requests from other cores.

ccx.png

l2_cache.png

l3_cache.png

Each core can handle two threads, but unlike Bulldozer has a single integer core.  Bulldozer modules featured two integer units and a shared FPU/SIMD.  Zen gets rid of CMT for good and we have a single integer and FPU units for each core.  The core can address two threads by utilizing AMD’s version of SMT (symmetric multi-threading).  There is a primary thread that gets higher priority while the second thread has to wait until resources are freed up.  This works far better in the real world than in how I explained it as resources are constantly being shuffled about and the primary thread will not monopolize all resources within the core.

Click here to read more about AMD's Zen architecture in Ryzen!