EPYC makes its move into the data center
Because we traditionally focus and feed on the excitement and build up surrounding consumer products, the AMD Ryzen 7 and Ryzen 5 launches were huge for us and our community. Finally seeing competition to Intel’s hold on the consumer market was welcome and necessary to move the industry forward, and we are already seeing the results of some of that with this week’s Core i9 release and pricing. AMD is, and deserves to be, proud of these accomplishments. But from a business standpoint, the impact of Ryzen on the bottom line will likely pale in comparison to how EPYC could fundamentally change the financial stability of AMD.
AMD EPYC is the server processor that takes aim at the Intel Xeon and its dominant status on the data center market. The enterprise field is a high margin, high profit area and while AMD once had significant share in this space with Opteron, that has essentially dropped to zero over the last 6+ years. AMD hopes to use the same tactic in the data center as they did on the consumer side to shock and awe the industry into taking notice; AMD is providing impressive new performance levels while undercutting the competition on pricing.
Introducing the AMD EPYC 7000 Series
Targeting the single and 2-socket systems that make up ~95% of the market for data centers and enterprise, AMD EPYC is smartly not trying to swing over its weight class. This offers an enormous opportunity for AMD to take market share from Intel with minimal risk.
Many of the specifications here have been slowly shared by AMD over time, including at the recent financial analyst day, but seeing it placed on a single slide like this puts everything in perspective. In a single socket design, servers will be able to integrate 32 cores with 64 threads, 8x DDR4 memory channels with up to 2TB of memory capacity per CPU, 128 PCI Express 3.0 lanes for connectivity, and more.
Worth noting on this slide, and was originally announced at the financial analyst day as well, is AMD’s intent to maintain socket compatibility going forward for the next two generations. Both Rome and Milan, based on 7nm technology, will be drop-in upgrades for customers buying into EPYC platforms today. That kind of commitment from AMD is crucial to regain the trust of a market that needs those reassurances.
Here is the lineup as AMD is providing it for us today. The model numbers in the 7000 series use the second and third characters as a performance indicator (755x will be faster than 750x, for example) and the fourth character to indicate the generation of EPYC (here, the 1 indicates first gen). AMD has created four different core count divisions along with a few TDP options to help provide options for all types of potential customers. It is worth noting that though this table might seem a bit intimidating, it is drastically more efficient when compared to the Intel Xeon product line that exists today, or that will exist in the future. AMD is offering immediate availability of the top five CPUs in this stack, with the bottom four due before the end of July.
Subject: Processors | June 19, 2017 - 11:48 PM | Morry Teitelman
Tagged: LGA2066, Intel X299, Intel Skylake-X, Intel Kaby Lake-X, FinalWire, aida64
Courtesy of FinalWire
Today, FinalWire Ltd. announced the release of version 5.92 of their diagnostic and benchmarking tool, AIDA64. This new version updates their Extreme Edition, Engineer Edition, and Business Edition of the software, available here.
The latest version of AIDA64 has been optimized to work with Intel's newest processors, the Skylake-X and Kaby Lake-X processors, as well as the Intel X299 "Union Point" chipset. The benchmarks and performance tests housed within AIDA64 have been updated for the Intel X299 chipset and processor line to utilize Advanced Vector Extensions 2 (AVX2), Fused Multiply-Add (FMA) instructions, and AES-NI hardware acceleration integrated into the new line of Intel processors.
New features include:
- AVX2 and FMA accelerated 64-bit benchmarks for Intel Skylake-X and Kaby Lake-X CPUs
- Improved support for AMD Ryzen 5 and Ryzen 7 processors
- Support for Pertelian (RS232) external LCD device
- Corsair K55 RGB LED keyboard support
- Corsair Glaive RGB LED mouse support
- 20 processor groups support
- NVMe 1.3, WDDM 2.2 support
- Advanced support for Areca RAID controllers
- GPU details for AMD Radeon RX 500 Series
- GPU details for nVIDIA GeForce GT 1030, GeForce MX150, Titan Xp
Software updates new to this release (since AIDA64 v5.00):
- AVX and FMA accelerated FP32 and FP64 ray tracing benchmarks
- Vulkan graphics accelerator diagnostics
- RemoteSensor smartphone and tablet LCD integration
- Logitech Arx Control smartphone and tablet LCD integration
- Microsoft Windows 10 Creators Update support
- Proper DPI scaling to better support high-resolution LCD and OLED displays
- AVX and FMA accelerated 64-bit benchmarks for AMD A-Series Bristol Ridge and Carrizo APUs
- AVX2 and FMA accelerated 64-bit benchmarks for AMD Ryzen Summit Ridge processors
- AVX2 and FMA accelerated 64-bit benchmarks for Intel Broadwell, Kaby Lake and Skylake CPUs
- AVX and SSE accelerated 64-bit benchmarks for AMD Nolan APU
- Optimized 64-bit benchmarks for Intel Apollo Lake, Braswell and Cherry Trail processors
- Preliminary support for AMD Zen APUs and Zen server processors
- Preliminary support for Intel Gemini Lake SoC and Knights Mill HPC CPU
- Improved support for Intel Cannonlake, Coffee Lake, Denverton CPUs
- Advanced SMART disk health monitoring
- Hot Keys to switch LCD pages, start or stop logging, show or hide SensorPanel
- Corsair K65, K70, K95, Corsair Strafe, Logitech G13, G19, G19s, G910, Razer Chroma RGB LED keyboard support
- Corsair, Logitech, Razer RGB LED mouse support
- Corsair and Razer RGB LED mousepad support
- AlphaCool Heatmaster II, Aquaduct, Aquaero, AquaStream XT, AquaStream Ultimate, Farbwerk, MPS, NZXT GRID+ V2, NZXT Kraken X52, PowerAdjust 2, PowerAdjust 3 sensor devices support
- Improved Corsair Link sensor support
- NZXT Kraken water cooling sensor support
- Corsair AXi, Corsair HXi, Corsair RMi, Enermax Digifanless, Thermaltake DPS-G power supply unit sensor support
- Support for EastRising ER-OLEDM032 (SSD1322), Gravitech, LCD Smartie Hardware, Leo Bodnar, Modding-FAQ, Noteu, Odospace, Saitek Pro Flight Instrument Panel, Saitek X52 Pro, UCSD LCD devices
- Portrait mode support for AlphaCool and Samsung SPF LCDs
- System certificates information
- Support for LGA-1151 and Socket AM4 motherboards
- Advanced support for Adaptec and Marvell RAID controllers
- Autodetect information and SMART drive health monitoring for Intel and Samsung NVMe SSDs
AIDA64 is developed by FinalWire Ltd., headquartered in Budapest, Hungary. The company’s founding members are veteran software developers who have worked together on programming system utilities for more than two decades. Currently, they have ten products in their portfolio, all based on the award-winning AIDA technology: AIDA64 Extreme, AIDA64 Engineer, AIDA64 Network Audit, AIDA64 Business and AIDA64 for Android,, iOS, Sailfish OS, Tizen, Ubuntu Touch and Windows Phone. For more information, visit www.aida64.com.
Specifications and Design
Intel is at an important crossroads for its consumer product lines. Long accused of ignoring the gaming and enthusiast markets, focusing instead on laptops and smartphones/tablets at the direct expense of the DIY user, Intel had raised prices and only shown limited ability to increase per-die performance over a fairly extended period. The release of the AMD Ryzen processor, along with the pending release of the Threadripper product line with up to 16 cores, has moved Intel into a higher gear; they are more prepared to increase features, performance, and lower prices now.
We have already talked about the majority of the specifications, pricing, and feature changes of the Core i9/Core i7 lineup with the Skylake-X designation, but it is worth including them here, again, in our review of the Core i9-7900X for reference purposes.
|Core i9-7980XE||Core i9-7960X||Core i9-7940X||Core i9-7920X||Core i9-7900X||Core i7-7820X||Core i7-7800X||Core i7-7740X||Core i5-7640X|
|Architecture||Skylake-X||Skylake-X||Skylake-X||Skylake-X||Skylake-X||Skylake-X||Skylake-X||Kaby Lake-X||Kaby Lake-X|
|Base Clock||?||?||?||?||3.3 GHz||3.6 GHz||3.5 GHz||4.3 GHz||4.0 GHz|
|Turbo Boost 2.0||?||?||?||?||4.3 GHz||4.3 GHz||4.0 GHz||4.5 GHz||4.2 GHz|
|Turbo Boost Max 3.0||?||?||?||?||4.5 GHz||4.5 GHz||N/A||N/A||N/A|
|Cache||16.5MB (?)||16.5MB (?)||16.5MB (?)||16.5MB (?)||13.75MB||11MB||8.25MB||8MB||6MB|
|DDR4-2666 Dual Channel|
|TDP||165 watts (?)||165 watts (?)||165 watts (?)||165 watts (?)||140 watts||140 watts||140 watts||112 watts||112 watts|
There is a lot to take in here. The three most interesting points are that, one, Intel plans to one-up AMD Threadripper by offering an 18-core processor. Two, which is potentially more interesting, is that it also wants to change the perception of the X299-class platform by offering lower price, lower core count CPUs like the quad-core, non-HyperThreaded Core i5-7640X. Third, we also see the first ever branding of Core i9.
Intel only provided detailed specifications up to the Core i9-7900X, which is a 10-core / 20-thread processor that has a base clock of 3.3 GHz and a Turbo peak of 4.5 GHz (using the new Turbo Boost Max Technology 3.0). It sports 13.75MB of cache thanks to an updated cache configuration, it includes 44 lanes of PCIe 3.0, an increase of 4 lanes over Broadwell-E, it has quad-channel DDR4 memory up to 2666 MHz and it has a 140 watt TDP. The new LGA2066 socket will be utilized. Pricing for this CPU is set at $999, which is interesting for a couple of reasons. First, it is $700 less than the starting MSRP of the 10c/20t Core i7-6950X from one year ago; obviously a big plus. However, there is quite a ways UP the stack, with the 18c/36t Core i9-7980XE coming in at a cool $1999.
|Core i9-7900X||Core i7-6950X||Core i7-7700K|
|Base Clock||3.3 GHz||3.0 GHz||4.2 GHz|
|Turbo Boost 2.0||4.3 GHz||3.5 GHz||4.5 GHz|
|Turbo Boost Max 3.0||4.5 GHz||4.0 GHz||N/A|
|TDP||140 watts||140 watts||91 watts|
The next CPU down the stack is compelling as well. The Core i7-7820X is the new 8-core / 16-thread HEDT option from Intel, with similar clock speeds to the 10-core above it (save the higher base clock). It has 11MB of L3 cache, 28-lanes of PCI Express (4 higher than Broadwell-E) but has a $599 price tag. Compared to the 8-core 6900K, that is ~$400 lower, while the new Skylake-X part iteration includes a 700 MHz clock speed advantage. That’s huge, and is a direct attack on the AMD Ryzen 7 1800X, which sells for $499 today and cut Intel off at the knees this March. In fact, the base clock of the Core i7-7820X is only 100 MHz lower than the maximum Turbo Boost clock of the Core i7-6900K!
It is worth noting the performance gap between the 7820X and the 7900X. That $400 gap seems huge and out of place when compared to the deltas in the rest of the stack that never exceed $300 (and that is at the top two slots). Intel is clearly concerned about the Ryzen 7 1800X and making sure it has options to compete at that point (and below) but feels less threatened by the upcoming Threadripper CPUs. Pricing out the 10+ core CPUs today, without knowing what AMD is going to do for that, is a risk and could put Intel in the same position as it was in with the Ryzen 7 release.
Subject: Processors | June 15, 2017 - 04:00 PM | Ryan Shrout
Tagged: xeon scalable, xeon, skylake-x, skylake-sp, skylake-ep, ring, mesh, Intel
Though we are just days away from the release of Intel’s Core i9 family based on Skylake-X, and a bit further away from the Xeon Scalable Processor launch using the same fundamental architecture, Intel is sharing a bit of information on how the insides of this processor tick. Literally. One of the most significant changes to the new processor design comes in the form of a new mesh interconnect architecture that handles the communications between the on-chip logical areas.
Since the days of Nehalem-EX, Intel has utilized a ring-bus architecture for processor design. The ring bus operated in a bi-directional, sequential method that cycled through various stops. At each stop, the control logic would determine if data was to be the collected to deposited with that module. These ring bus stops are located at memory controllers, CPU cores / caches, the PCI Express interface, memory controllers, LLCs, etc. This ring bus was fairly simple and easily expandable by simply adding more stops on the ring bus itself.
However, over several generations, the ring bus has become quite large and unwieldly. Compare the ring bus from Nehalem above, to the one for last year’s Xeon E5 v5 platform.
The spike in core counts and other modules caused a ballooning of the ring that eventually turned into multiple rings, complicating the design. As you increase the stops on the ring bus you also increase the physical latency of the messaging and data transfer, for which Intel compensated by increasing bandwidth and clock speed of this interface. The expense of that is power and efficiency.
For an on-die interconnect to remain relevant, it needs to be flexible in bandwidth scaling, reduce latency, and remain energy efficient. With 28-core Xeon processors imminent, and new IO capabilities coming along with it, the time for the ring bus in this space is over.
Starting with the HEDT and Xeon products released this year, Intel will be using a new on-chip design called a mesh that Intel promises will offer higher bandwidth, lower latency, and improved power efficiency. As the name implies, the mesh architecture is one in which each node relays messages through the network between source and destination. Though I cannot share many of the details on performance characteristics just yet, Intel did share the following diagram.
As Intel indicates in its blog on the mesh announcements, this generic diagram “shows a representation of the mesh architecture where cores, on-chip cache banks, memory controllers, and I/O controllers are organized in rows and columns, with wires and switches connecting them at each intersection to allow for turns. By providing a more direct path than the prior ring architectures and many more pathways to eliminate bottlenecks, the mesh can operate at a lower frequency and voltage and can still deliver very high bandwidth and low latency. This results in improved performance and greater energy efficiency similar to a well-designed highway system that lets traffic flow at the optimal speed without congestion.”
The bi-directional mesh design allows a many-core design to offer lower node to node latency than the ring architecture could provide, and by adjusting the width of the interface, Intel can control bandwidth (and by relation frequency). Intel tells us that this can offer lower average latency without increasing power. Though it wasn’t specifically mentioned in this blog, the assumption is that because nothing is free, this has a slight die size cost to implement the more granular mesh network.
Using a mesh architecture offers a couple of capabilities and also requires a few changes to the cache design. By dividing up the IO interfaces (think multiple PCI Express banks, or memory channels), Intel can provide better average access times to each core by intelligently spacing the location of those modules. Intel will also be breaking up the LLC into different segments which will share a “stop” on the network with a processor core. Rather than the previous design of the ring bus where the entirety of the LLC was accessed through a single stop, the LLC will perform as a divided system. However, Intel assures us that performance variability is not a concern:
Negligible latency differences in accessing different cache banks allows software to treat the distributed cache banks as one large unified last level cache. As a result, application developers do not have to worry about variable latency in accessing different cache banks, nor do they need to optimize or recompile code to get a significant performance boosts out of their applications.
There is a lot to dissect when it comes to this new mesh architecture for Xeon Scalable and Core i9 processors, including its overall effect on the LLC cache performance and how it might affect system memory or PCI Express performance. In theory, the integration of a mesh network-style interface could drastically improve the average latency in all cases and increase maximum memory bandwidth by giving more cores access to the memory bus sooner. But, it is also possible this increases maximum latency in some fringe cases.
Further testing awaits for us to find out!
Subject: Processors | June 9, 2017 - 03:02 PM | Jeremy Hellstrom
Tagged: amd, ryzen 5, productivity, ryzen 7 1800x, Ryzen 5 1500X, AMD Ryzen 5 1600, Ryzen 5 1600X, ryzen 5 1400
The Tech Report previously tested the gaming prowess of AMD's new processor family and are now delving into the performance of productivity software on Ryzen. Many users who are shopping for a Ryzen will be using it for a variety of non-gaming tasks such as content creation, coding or even particle flow analysis. The story is somewhat different when looking through these tests, with AMD taking the top spot in many benchmarks and in others being surpassed only by the Core i7 6700k, in some tests that chip leaves all competition in the dust by a huge margin. For budget minded shoppers, the Ryzen 5 1600 barely trails both the i7-7700K and the 1600X in our productivity tests making it very good bargain for someone looking for a new system. Check out the full suite of tests right here.
"Part one of our AMD Ryzen 5 review proved these CPUs have game, but what happens when we have to put the toys away and get back to work? We ran all four Ryzen 5 CPUs through a wide range of productivity testing to find out."
Here are some more Processor articles from around the web:
- AMD Ryzen 5 1400 3.2 GHz @ techPowerUp
- Intel Skylake X and Kaby Lake X: Luke and Leo Discuss @ Kitguru
- Core i3-7350K @ Hardware Secrets
Subject: Processors | May 31, 2017 - 02:33 PM | Tim Verry
Tagged: Intel, goldmont+, gemini lake, apollo lake, 14nm
Information recently leaked on the successor to Intel’s low power Apollo Lake SoCs dubbed Gemini Lake. Several sites via FanlessTech claim that Gemini Lake will launch by the end of the year and will be the dual and quad core processors used to power low cost notebooks, tablets, 2-in-1 convertibles, and SFF desktop and portable PCS.
A leaked Intel roadmap.
Gemini Lake appears to be more tick than tock in that it uses a similar microarchitecture as Apollo Lake and relies mainly on process node improvements with the refined 14nm+ process to increase power efficiency and performance per watt. On the CPU side of things, Gemini Lake utilizes the Goldmont+ microarchitecture and features two or four cores paired with 4MB of L2 cache. Intel has managed to wring higher clockspeeds while lowering power draw out of the 14nm process. A doubling of the L2 cache versus Apollo Lake will certainly give the chip a performance boost. The SoC will use Intel Gem9 graphics with up to 18 Execution Units (similar to Apollo Lake) but the GPU will presumably run at higher clocks. Additionally, the Gemini Lake SoC will integrate a new single channel DDR4 memory controller that will support higher memory speeds, s WLAN controller (a separate radio PHY is still required on the motherboard) supporting 802.11 b/g/n and Bluetooth 4.0.
Should the leaked information turn out to be true, he new Gemini Lake chips are shaping up to be a good bit faster than their predecessor while sipping power with TDPs of up to 6W for mobile devices and 10W for SFF desktop.
The lower power should help improve battery life a bit which is always a good thing. And if they can pull off higher performance as well all the better!
Unfortunately, it is sounding like Gemini Lake will not be ready in te for the back to school or holiday shopping seasons this year. I expect to see a ton of announcements on devices using the new SoCs at CES though!
Subject: Processors, Mobile | May 31, 2017 - 03:30 AM | Ryan Shrout
Tagged: snapdragon 835, snapdragon, qualcomm, Lenovo, hp, Gigabit LTE, asus
Back in December of 2016, Qualcomm and Microsoft announced a partnership to bring Windows to platforms based on the Snapdragon platform. Not Windows RT redux, not Windows mobile, not Windows Mini, full blown Windows with 100% application support and compatibility. It was a surprising and gutsy move after the tepid response (at best) to the ARM-based Windows RT launch several years ago. Qualcomm and Microsoft assure us that this time things are different, thanks to a lot of learning and additional features that make the transition seamless for consumers.
The big reveal for this week is the initial list of partners that Qualcomm has brought on board to build Windows 10 system around the Snapdragon 835 Mobile Platform. ASUS, HP, and Lenovo will offer machines based around that SoC, though details on form factors, time frames, pricing and anything else you WANT to know about it, is under wraps. These are big time names though, leaders in the PC notebook space, and I think their input to the platform is going to be just as valuable as them selling and marketing it. HP is known for enterprise solutions, Lenovo for mass market share, and ASUS for innovative design and integration.
(If you want to see an Android-based representation of performance on a mobile-based Snapdragon 835 processor, check out our launch preview from March.)
Also on the show floor, Qualcomm begins its marketing campaign aimed to show the value that Snapdragon offers to the Windows ecosystem. Today that is exemplified in a form factor difference comparing the circuit board layout of a Snapdragon 835-based notebook and a “typical” competitor machine.
Up top, Qualcomm is showing us the prototype for the Windows 10 Snapdragon 835 Mobile Platform. It has a total area of 50.4 cm2 and just by eyeballing the two images, there is a clear difference in scope. The second image shows only what Qualcomm will call a “competing commercial circuit board” with an area of 98.1 cm2. That is a decrease in PCB space of 48% (advantage Qualcomm) and gives OEMs a lot of flexibility in design that they might not have had otherwise. They can use that space to make machines thinner, lighter, include a larger battery, or simply to innovate outside the scope of what we can imagine today.
Subject: Processors | May 30, 2017 - 10:49 PM | Ryan Shrout
Tagged: Threadripper, ryzen, PCI Express, amd
During AMD’s Computex keynote, the company confirmed that the every one of the upcoming Threadripper HEDT platform first announced earlier in May, will include 64 lanes of PCI Express 3.0. There will not be a differentiation in the product line with PCIe lanes or in memory channels (all quad-channel DDR4). This potentially gives AMD the advantage for system connectivity, as the Intel Skylake-X processor just announced yesterday will only sport of 44 lanes of PCIe 3.0 on chip.
Having 64 lanes of PCI Express on Threadripper could be an important differentiation point for the platform, offering the ability to run quad GPUs at full x16 speeds, without the need of any PLX-style bridge chips. You could also combine a pair of x16 graphics cards, and still have 32 lanes left for NVMe storage, 10 GigE networking devices, multi-channel SAS controllers, etc. And that doesn’t include any additional lanes that the X399 chipset may end up providing. We still can’t wait to see what motherboard vendors like ASUS, MSI and Gigabyte create with all that flexibility.
On-stage, we saw a couple of demonstrations of what this connectivity capability can provide. First, a Threadripper system was shown powering Radeon RX Vega graphics cards running the new Prey PC title at 4K.
On-stage, we saw a couple of demonstrations of what this connectivity capability can provide. First, a Threadripper system was shown running the same Blender rendering demo used in the build up to the initial Ryzen CPU launch.
Next, CEO Lisa Su came back on stage to demo AMD Threadripper running with a set of four Radeon Vega Frontier Edition cards running together for ray tracing.
And finally, a gaming demo! AMD Ryzen Threadripper was demoed with dual Radeon RX Vega (the gaming versions) graphics cards running at 4K/Ultra settings on the new Prey PC title. No frame rates were mentioned, no FRAPS in the corner, etc.
(Side note: Radeon Vega FE was confirmed for June 27th launch. Radeon RX Vega will launch at SIGGRAPH at the end of July!)
We still have a ways to go before we can make any definitive comments on Threadripper, and with Intel announcing processors with core counts as high as 18 just yesterday, it’s fair to say that some of the excitement has been dwindling. However, with aggressive pricing and the right messaging from AMD, they still have an amazing opportunity to break away a large segment of the growing, and profitable, HEDT market from Intel.
Subject: Processors, Mobile | May 30, 2017 - 10:43 PM | Ryan Shrout
Tagged: amd, ryzen, mobile, Vega
As part of the company’s press conference from Computex 2017, AMD displayed for the first time to the public a working notebook utilizing the upcoming Ryzen SoC with on-die Vega graphics. The CPU is a 4-core / 8-thread design and the system was shown playing back some basic video.
We don’t really have any more detail than that on the platform, other availability in second half of this year. The system being shown was impressively built, with a sub-15mm ultra-portable form factor, putting to rest concerns over AMD’s ability to scale Zen and Vega to the lower required power numbers. AMD claims that Ryzen mobile will offer 50% better CPU performance and 40% better GPU performance than the 7th Generation AMD APU. I can't wait to test this myself, but with a jump like that AMD should be competitive in the processor space again and continue its dominance in integrated graphics.
The Vega on-die integration was first mentioned at the company’s financial analyst day, though if you were like me, it went unnoticed in the wave of Threadripper and EPYC news. This iteration is obviously not using a non-HBM2 memory implementation, but I don’t yet know if there is any kind of non-system-memory cache on the processor to help improve integrated graphics performance.
For a product not slated to be released until the end of this year, seeing a low profile, high performance demo of the platform is a good sign for AMD and a welcome indicator that the company could finally fight back in the mobile notebook space.
We are up to two...
UPDATE (5/31/2017): Crystal Dynamics was able to get back to us with a couple of points on the changes that were made with this patch to affect the performance of AMD Ryzen processors.
- Rise of the Tomb Raider splits rendering tasks to run on different threads. By tuning the size of those tasks – breaking some up, allowing multicore CPUs to contribute in more cases, and combining some others, to reduce overheads in the scheduler – the game can more efficiently exploit extra threads on the host CPU.
- An optimization was identified in texture management that improves the combination of AMD CPU and NVIDIA GPU. Overhead was reduced by packing texture descriptor uploads into larger chunks.
There you have it, a bit more detail on the software changes made to help adapt the game engine to AMD's Ryzen architecture. Not only that, but it does confirm our information that there was slightly MORE to address in the Ryzen+GeForce combinations.
Despite a couple of growing pains out of the gate, the Ryzen processor launch appears to have been a success for AMD. Both the Ryzen 7 and the Ryzen 5 releases proved to be very competitive with Intel’s dominant CPUs in the market and took significant leads in areas of massive multi-threading and performance per dollar. An area that AMD has struggled in though has been 1080p gaming – performance in those instances on both Ryzen 7 and 5 processors fell behind comparable Intel parts by (sometimes) significant margins.
Our team continues to watch the story to see how AMD and game developers work through the issue. Most recently I posted a look at the memory latency differences between Ryzen and Intel Core processors. As it turns out, the memory latency differences are a significant part of the initial problem for AMD:
Because of this, I think it is fair to claim that some, if not most, of the 1080p gaming performance deficits we have seen with AMD Ryzen processors are a result of this particular memory system intricacy. You can combine memory latency with the thread-to-thread communication issue we discussed previously into one overall system level complication: the Zen memory system behaves differently than anything we have seen prior and it currently suffers in a couple of specific areas because of it.
In that story I detailed our coverage of the Ryzen processor and its gaming performance succinctly:
Our team has done quite a bit of research and testing on this topic. This included a detailed look at the first asserted reason for the performance gap, the Windows 10 scheduler. Our summary there was that the scheduler was working as expected and that minimal difference was seen when moving between different power modes. We also talked directly with AMD to find out its then current stance on the results, backing up our claims on the scheduler and presented a better outlook for gaming going forward. When AMD wanted to test a new custom Windows 10 power profile to help improve performance in some cases, we took part in that too. In late March we saw the first gaming performance update occur courtesy of Ashes of the Singularity: Escalation where an engine update to utilize more threads resulted in as much as 31% average frame increase.
Quick on the heels of the Ryzen 7 release, AMD worked with the developer Oxide on the Ashes of the Singularity: Escalation engine. Through tweaks and optimizations, the game was able to showcase as much as a 30% increase in average frame rate on the integrated benchmark. While this was only a single use case, it does prove that through work with the developers, AMD has the ability to improve the 1080p gaming positioning of Ryzen against Intel.
Fast forward to today and I was surprised to find a new patch for Rise of the Tomb Raider, a game that was actually one of the worst case scenarios for AMD with Ryzen. (Patch #12, v1.0.770.1) The patch notes mention the following:
The following changes are included in this patch
- Fix certain DX12 crashes reported by users on the forums.
- Improve DX12 performance across a variety of hardware, in CPU bound situations. Especially performance on AMD Ryzen CPUs can be significantly improved.
While we expect this patch to be an improvement for everyone, if you do have trouble with this patch and prefer to stay on the old version we made a Beta available on Steam, build 767.2, which can be used to switch back to the previous version.
We will keep monitoring for feedback and will release further patches as it seems required. We always welcome your feedback!
Obviously the data point that stood out for me was the improved DX12 performance “in CPU bound situations. Especially on AMD Ryzen CPUs…”
Remember how the situation appeared in April?
The Ryzen 7 1800X was 24% slower than the Intel Core i7-7700K – a dramatic difference for a processor that should only have been ~8-10% slower in single threaded workloads.
How does this new patch to RoTR affect performance? We tested it on the same Ryzen 7 1800X benchmarks platform from previous testing including the ASUS Crosshair VI Hero motherboard, 16GB DDR4-2400 memory and GeForce GTX 1080 Founders Edition using the 378.78 driver. All testing was done under the DX12 code path.
The Ryzen 7 1800X score jumps from 107 FPS to 126.44 FPS, an increase of 17%! That is a significant boost in performance at 1080p while still running at the Very High image quality preset, indicating that the developer (and likely AMD) were able to find substantial inefficiencies in the engine. For comparison, the 8-core / 16-thread Intel Core i7-6900K only sees a 2.4% increase from this new game revision. This tells us that the changes to the game were specific to Ryzen processors and their design, but that no performance was redacted from the Intel platforms.