Tweaks for days
It seems like it’s been months since AMD launched Ryzen, its first new processor architecture in about a decade, when in fact we are only four weeks removed. One of the few concerns about the Ryzen processors centered on its performance in some gaming performance results, particularly in common resolutions like 1080p. While I was far from the only person to notice these concerns, our gaming tests clearly showed a gap between the Ryzen 7 1800X and the Intel Core i7-7700K and 6900K processors in Civilization 6, Hitman and Rise of the Tomb Raider.
A graph from our Ryzen launch coverage...
We had been working with AMD for a couple of weeks on the Ryzen launch and fed back our results with questions in the week before launch. On March 2nd, AMD’s CVP of Marketing John Taylor gave us a prepared statement that acknowledged the issue but promised changes come in form for game engine updates. These software updates would need to be implemented by the game developers themselves in order to take advantage of the unique and more complex core designs of the Zen architecture. We had quotes from the developers of Ashes of the Singularity as well as the Total War series to back it up.
And while statements promising change are nice, it really takes some proof to get the often sceptical tech media and tech enthusiasts to believe that change can actually happen. Today AMD is showing its first result.
The result of 400 developer hours of work, the Nitrous Engine powering Ashes of the Singularity received an update today to version 26118 that integrates updates to threading to better balance the performance across Ryzen 7’s 8 cores and 16 threads. I was able to do some early testing on the new revision, as well as with the previous retail shipping version (25624) to see what kind of improvements the patch brings with it.
Stardock / Oxide CEO Brad Wardell had this to say in a press release:
“I’ve always been vocal about taking advantage of every ounce of performance the PC has to offer. That’s why I’m a strong proponent of DirectX 12 and Vulkan® because of the way these APIs allow us to access multiple CPU cores, and that’s why the AMD Ryzen processor has so much potential,” said Stardock and Oxide CEO Brad Wardell. “As good as AMD Ryzen is right now – and it’s remarkably fast – we’ve already seen that we can tweak games like Ashes of the Singularity to take even more advantage of its impressive core count and processing power. AMD Ryzen brings resources to the table that will change what people will come to expect from a PC gaming experience.”
Our testing setup is in line with our previous CPU performance stories.
|Test System Setup|
|CPU||AMD Ryzen 7 1800X
Intel Core i7-6900K
|Motherboard||ASUS Crosshair VI Hero (Ryzen)
ASUS X99-Deluxe II (Broadwell-E)
|Storage||Corsair Force GS 240 SSD|
|Graphics Card||NVIDIA GeForce GTX 1080 8GB|
|Graphics Drivers||NVIDIA 378.49|
|Power Supply||Corsair HX1000|
|Operating System||Windows 10 Pro x64|
I was using the latest BIOS for our ASUS Crosshair VI Hero motherboard (1002) and upgraded to some Geil RGB (!!) memory capable of running at 3200 MHz on this board with a single BIOS setting adjustment. All of my tests were done at 1080p in order to return to the pain point that AMD was dealing with on launch day.
Let’s see the results.
These are substantial performance improvements with the new engine code! At both 2400 MHz and 3200 MHz memory speeds, and at both High and Extreme presets in the game (all running in DX12 for what that’s worth), the gaming performance on the GPU-centric is improved. At the High preset (which is the setting that AMD used in its performance data for the press release), we see a 31% jump in performance when running at the higher memory speed and a 22% improvement with the lower speed memory. Even when running at the more GPU-bottlenecked state of the Extreme preset, that performance improvement for the Ryzen processors with the latest Ashes patch is 17-20%!
It’s also important to note that Intel performance is unaffected – either for the better or worse. Whatever work Oxide did to improve the engine for AMD’s Ryzen processors had NO impact on the Core processors, which is interesting to say the least. The cynic in me would believe there is little chance that any agnostic changes to code would raise Intel’s multi-core performance at least a little bit.
So what exactly is happening to the engine with v26118? I haven’t had a chance to have an in-depth conversation with anyone at AMD or Oxide yet on the subject, but at a high level, I was told that this is what happens when instructions and sequences are analyzed for an architecture specifically. “For basically 5 years”, I was told, Oxide and other developers have dedicated their time to “instruction traces and analysis to maximize Intel performance” which helps to eliminate poor instruction setup. After spending some time with Ryzen and the necessary debug tools (and some AMD engineers), they were able to improve performance on Ryzen without adversely affecting Intel parts.
Core to core latency testing on Ryzen 7 1800X
I am hoping to get more specific detail in the coming days, but it would seem very likely that Oxide was able to properly handle the more complex core to core communication systems on Ryzen and its CCX implementation. We demonstrated early this month how thread to thread communication across core complexes causes substantially latency penalties, and that a developer that intelligently manages threads that have dependencies on the core complex can improve overall performance. I would expect this is at least part of the solution Oxide was able to integrate (and would also explain why Intel parts are unaffected).
- Ryzen 7 1800X - $499 - Amazon.com
- Ryzen 7 1700X - $399 - Amazon.com
- Ryzen 7 1700 - $329 - Amazon.com
What is important now is that AMD takes this momentum with Ashes of the Singularity and actually does something with it. Many of you will recognize Ashes as the flagship title for Mantle when AMD made that move to change the programming habits and models for developers, and though Mantle would eventually become Vulkan and drive DX12 development, it did not foretell an overall shift as it hoped to. Can AMD and its developer relations team continue to make the case that spending time and money (which is what 400 developer hours equates to) to make specific performance enhancements for Ryzen processors is in the best interest of everyone? We’ll soon find out.
Subject: General Tech | March 28, 2017 - 01:04 PM | Jeremy Hellstrom
Tagged: amd, Vega, rumour, HBM2
The Inquirer have posted a tiny bit of information about AMD's upcoming Vega and as any rumours about the new GPU are hard to find it is the best we have at the moment. AMD's claim is that the second generation HBM present on the 4GB and 8GB models could offer equivalent memory bandwidth to a GTX 1080 Ti, which makes perfect sense. The GTX 1080 Ti offers 484 GB/s of memory bandwidth while AMD's R9 series first generation HBM offers 512 GB/s. The real trick is filling that pipeline to give AMD's HBM2 based cards a chance to shine and which depends on software developers as much as it does the hardware. As well, The Inquirer discusses the possible efficiency advantages that Vega will have, which could result in smaller cards as well as an effective mobile product. Pop over to take a look at the current rumours, here is hoping we can provide more detailed information in the near future.
"AMD HAS TEASED more information about its forthcoming Vega-based graphics cards, revealing that they will come with either 4GB or 8GB memory and hinting that a launch is imminent."
Here is some more Tech News from around the web:
- iPhone-havers think they're safe. But they're not @ The Register
- FYI Docs.com users: You may have leaked passwords, personal info – thousands have @ The Register
- LastPass scrambles to fix another major flaw – once again spotted by Google's bugfinders @ The Register
- Johnny Depp signs on to play John McAfee in a film of his life @ The Inquirer
- Samsung 4K Blu-ray Player @ Hardware Secrets
- Futuremark Ends Support for 3DMark Vantage and PCMark Vantage @ [H]ard|OCP
- Konica Minolta Unveils the Future of Work, Or At Least Its Version @ Kitguru
- Win a PC hardware bundle with Gigabyte AORUS, HyperX and KitGuru
Subject: Processors | March 28, 2017 - 11:48 AM | Morry Teitelman
Tagged: FinalWire, aida64, ryzen, amd, Intel
Courtesy of FinalWire
Today, FinalWire Ltd. announced the release of version 5.90 of their diagnostic and benchmarking tool, AIDA64. This new version updates their Extreme Edition, Engineer Edition, and Business Edition of the software, available here.
The latest version of AIDA64 has been optimized to work with AMD's Ryzen "Summit Ridge" and Intel's "Apollo Lake" processors, as well as updated to work with Microsoft's Windows 10 Creators Update release. The benchmarks and performance tests housed within AIDA64 have been updated for the Ryzen processor to utilize the VX2, FMA3, AES-NI and SHA instruction sets.
New features include:
- AVX2 and FMA accelerated 64-bit benchmarks for AMD Ryzen Summit Ridge processors
- Microsoft Windows 10 Creators Update support
- Optimized 64-bit benchmarks for Intel Apollo Lake SoC
- Improved support for Intel Cannonlake, Coffee Lake, Denverton, Kaby Lake-X, Skylake-X CPUs
- Preliminary support for AMD Zen server processors
- Preliminary support for Intel Gemini Lake SoC and Knights Mill HPC CPU
- NZXT Kraken X52 sensor support
- Socket AM4 motherboards support
- Improved support for Intel B250, H270, Q270 and Z270 chipset based motherboards
- EastRising ER-OLEDM032 (SSD1322) OLED support
- SMBIOS 3.1.1 support
- Crucial M600, Crucial MX300, Intel Pro 5400s, SanDisk Plus, WD Blue SSD support
- Improved support for Samsung NVMe SSDs
- Advanced support for HighPoint RocketRAID 27xx RAID controllers
- GPU details for nVIDIA GeForce GTX 1080 Ti, Quadro GP100, Tesla P6
Software updates new to this release (since AIDA64 v5.00):
- AVX and FMA accelerated FP32 and FP64 ray tracing benchmarks
- Vulkan graphics accelerator diagnostics
- RemoteSensor smartphone and tablet LCD integration
- Logitech Arx Control smartphone and tablet LCD integration
- Microsoft Windows 10 TH2 (November Update) support
- Proper DPI scaling to better support high-resolution LCD and OLED displays
- AVX and FMA accelerated 64-bit benchmarks for AMD A-Series Bristol Ridge and Carrizo APUs
- AVX2 and FMA accelerated 64-bit benchmarks for Intel Broadwell, Kaby Lake and Skylake CPUs
- AVX and SSE accelerated 64-bit benchmarks for AMD Nolan APU
- Optimized 64-bit benchmarks for Intel Braswell and Cherry Trail processors
- Advanced SMART disk health monitoring
- Hot Keys to switch LCD pages, start or stop logging, show or hide SensorPanel
- Corsair K65, K70, K95, Corsair Strafe, Logitech G13, G19, G19s, G910, Razer Chroma RGB LED keyboard support
- Corsair, Logitech, Razer RGB LED mouse support
- Corsair and Razer RGB LED mousepad support
- AlphaCool Heatmaster II, Aquaduct, Aquaero, AquaStream XT, AquaStream Ultimate, Farbwerk, MPS, NZXT GRID+ V2, PowerAdjust 2, PowerAdjust 3 sensor devices support
- Improved Corsair Link sensor support
- NZXT Kraken water cooling sensor support
- Corsair AXi, Corsair HXi, Corsair RMi, Enermax Digifanless, Thermaltake DPS-G power supply unit sensor support
- Support for Gravitech, LCD Smartie Hardware, Leo Bodnar, Modding-FAQ, Noteu, Odospace, Saitek Pro Flight Instrument Panel, Saitek X52 Pro, UCSD LCD devices
- Portrait mode support for AlphaCool and Samsung SPF LCDs
- System certificates information
- Advanced support for Adaptec and Marvell RAID controllers
AIDA64 is developed by FinalWire Ltd., headquartered in Budapest, Hungary. The company’s founding members are veteran software developers who have worked together on programming system utilities for more than two decades. Currently, they have ten products in their portfolio, all based on the award-winning AIDA technology: AIDA64 Extreme, AIDA64 Engineer, AIDA64 Network Audit, AIDA64 Business and AIDA64 for Android,, iOS, Sailfish OS, Tizen, Ubuntu Touch and Windows Phone. For more information, visit www.aida64.com.
Subject: General Tech | March 24, 2017 - 06:27 PM | Jeremy Hellstrom
Tagged: gigabyte, AB350-Gaming 3, b350, amd, ryzen
The design of the Gigabyte GA-AB350-Gaming 3 is quite spartan, but don't let that fool you as it is heavily infected with RGB-itis. This brand new AMD motherboard is a hair thinner than your average ATX motherboard, at 305x230mm but that doesn't mean the board is lacking in features. There is a single x16 PCIe 3.0 slot, and a sole x4 PCIe 2.0 slot with three x1 PCIe 2.0 slots for additional cards. Of the six SATA ports, only four can be used if you install an M.2 SSD, a reasonable pool of drives for most. There is HDMI 1.4 and DVI connectors on the back, along with a half dozen USB 3.1 ports on the back of which two are Gen 2 and four Gen 1. Check out the full review at Modders Inc.
"AMD is back with a new CPU line-up that brings competitive performance once again against Intel’s current generation of processors at a lower price. In true AMD fashion, the AM4 motherboard line offers the same value alternative as well, offering the latest features similarly found on the latest generation Intel processors natively including USB 3.1 Gen 2, M.2 NVMe support …"
Here is some more Tech News from around the web:
- ASRock Fatal1ty Z270 Professional Gaming i7 @ Kitguru
- ASRock Fatal1ty Z270 Gaming-ITX/ac Review @ Hardware Canucks
- ASUS ROG Maximus IX Apex @ Kitguru
- Gigabyte Z170XP-SLI Review @ Neoseeker
Subject: Processors | March 17, 2017 - 03:48 PM | Jeremy Hellstrom
Tagged: amd, Intel, ryzen, sanity check
Ars Technica asks the question that many reasonable people are also pondering, "Intel still beats Ryzen at games, but how much does it matter?". We here at PCPer have seen the same sorts of responses which Ars has, there is a group of people who had the expectation that Ryzen would miraculously beat any and all Intel chips at every possible task. More experienced heads were hoping for about what we received, a chip which can challenge Broadwell, offering performance which improved greatly on their previous architecture. The launch has revealed some growing pains with AMD's new baby but not anything which makes Ryzen bad.
Indeed, with more DX12 or Vulkan games arriving we should see AMD's performance improve, especially if programmers start to take more effective advantage of high core counts. Head over to read the article, unless you feel that is not a requirement to comment on this topic.
"In spite of this, reading the various reviews around the Web—and comment threads, tweets, and reddit posts—one gets the feeling that many were hoping or expecting Ryzen to somehow beat Intel across the board, and there's a prevailing narrative that Ryzen is in some sense a bad gaming chip. But this argument is often paired with the claim that some kind of non-specific "optimization" is going to salvage the processor's performance, that AMD fans just need to keep the faith for a few months, and that soon Ryzen's full power will be revealed."
Here are some more Processor articles from around the web:
- AMD Ryzen 7 1800X, 1700X, and 1700 Processor Review @ Neoseeker
- AMD's Ryzen 5 Processors; A Preview @ Hardware Canucks
- AMD Ryzen 7 1800X 3.6 GHz @ techPowerUp
- AMD Ryzen 7 1700 @ Kitguru
Here Comes the Midrange!
Today AMD is announcing the upcoming Ryzen 5 CPUs. A little bit was known about them from several weeks ago when AMD talked about their upcoming 6 core processors, but official specifications were lacking. Today we get to see what Ryzen 5 is mostly about.
There are four initial SKUs that AMD is talking about this evening. These encompass quad core and six core products. There are two “enthusiast” level SKUs with the X connotation while the other two are aimed at a less edgy crowd.
The two six core CPUs are the 1600 and 1600X. The X version features the higher extended frequency range when combined with performance cooling. That unit is clocked at a base 3.6 GHz and achieves a boost of 4 GHz. This compares well to the top end R7 1800X, but it is short 2 cores and four threads. The price of the R5 1600X is a very reasonable $249. The 1600 does not feature the extended range, but it does come in at a 3.2 GHz base and 3.6 GHz boost. The R5 1600 has a MSRP of $219.
When we get to the four core, eight thread units we see much the same stratification. The top end 1500X comes in at $189 and features a base clock of 3.5 GHz and a boost of 3.7 GHz. What is interesting about this model is that the XFR is raised by 100 MHz vs. other XFR CPUs. So instead of an extra 100 MHz boost when high end cooling is present we can expect to see 200 MHz. In theory this could run at 3.9 GHz in the extended state. The lowest priced R5 is the 1400 which comes in at a very modest $169. This features a 3.2 GHz base clock and a 3.4 GHz boost.
The 1400, 1500, and 1600 CPUs come with Wraith cooling solutions. The 1600X comes bare as it is assumed that users want to use something a bit more robust. The R5 1400 comes with the lower end Wraith Stealth cooler while the R5 1500X and R5 1600 come with the bigger Wraith Spire. The bottom 3 SKUs are all rated at 65 watts TDP. The 1600X comes in at the higher 95 watt rating. Each of the CPUs are unlocked for overclocking.
These chips will provide a more fleshed out pricing structure for the Ryzen processors and provide users and enthusiasts with lower cost options for those wanting to invest in AMD again. These chips all run on the new AM4 platform which are pretty strong in terms of features and I/O performance.
AMD is not shipping these parts today, but rather announcing them. Review samples are not in hand yet and AMD expects world-wide availability by April 11. This is likely a very necessary step for AMD as current AM4 motherboard availability is not at the level we were expecting to see. We also are seeing some pretty quick firmware updates from motherboard partners to address issues with these first AM4 boards. By April 11 I would expect to see most of the issues solved and a healthy supply of motherboards on the shelves to handle the influx of consumers waiting to buy these more midrange priced CPUs from AMD.
What they did not cover or answer would be how the four core products would be presented. Would each be a single CCX and only 8 MB of L3 cace, or would AMD disable two cores in each CCX and present 16 MB of L3? We currently do not have the answer to this. Considering the latency between accessing different CCX units we can surely hope they only keep one CCX active.
Ryzen has certainly been a success for AMD and I have no doubt that their quarter will be pretty healthy with the estimated sales of around 1 million Ryzen CPUs since launch. Announcing these new chips will give the mainstream and budget enthusiasts something to look forward to and plan their purchases around. AMD is not announcing the Ryzen 3 products at this time.
Update: AMD got back to me this morning about a question I asked them about the makeup of cores, CCX units, and L3 cache. Here is their response.
1600X: 3+3 with 16MB L3 cache. 1600: 3+3 with 16MB L3 cache. 1500X: 2+2 with 16MB L3 cache. 1400: 2+2 with 8MB L3 cache. As with Ryzen 7, each core still has 512KB local L2 cache.
Subject: Processors | March 15, 2017 - 05:51 PM | Josh Walrath
Tagged: ryzen, Infinity Fabric, hwbot, FMA3, Control Fabric, bug, amd, AM4
Last week a thread was started at the HWBOT forum and discussed a certain workload that resulted in a hard lock every time it was run. This was tested with a variety of motherboards and Ryzen processors from the 1700 to the 1800X. In no circumstance at default power and clock settings did the processor not lock from the samples that they have worked on, as well as products that contributors have been able to test themselves.
This is quite reminiscent of the Coppermine based Pentium III 1133 MHz processor from Intel which failed in one specific workload (compiling). Intel had shipped a limited number of these CPUs at that time, and it was Kyle from HardOCP and Tom from Tom’s Hardware that were the first to show this behavior in a repeatable environment. Intel stopped shipping these models and had to wait til the Tualatin version of the Pentium III to be released to achieve that speed (and above) and be stable in all workloads.
The interesting thing about this FMA3 finding is that it is seen to not be present in some overclocked Ryzen chips. To me this indicates that it could be a power delivery issue with the chip. A particular workload that heavily leans upon the FPU could require more power than the chip’s Control Fabric can deliver, therefore causing a hard lock. Several tested overclocked chips with much more power being pushed to them seems as though enough power is being applied to the specific area of the chip to allow the operation to be completed successfully.
This particular fact implies to me that AMD does not necessarily have a bug such as what Intel had with the infamous F-Div issue with the original Pentium, or AMD’s issue with the B2 stepping of Phenom. AMD has a very complex voltage control system that is controlled by the Control Fabric portion of the Infinity Fabric. With a potential firmware or microcode update this could be a fixable problem. If this is the case, then AMD would simply increase power being supplied to the FPU/SIMD/SSE portion of the Ryzen cores. This may come at a cost through lower burst speeds to keep TDP within their stated envelope.
A source at AMD has confirmed this issue and that a fix will be provided via motherboard firmware update. More than likely this comes in the form of an updated AGESA protocol.
Subject: Processors | March 13, 2017 - 08:48 PM | Sebastian Peak
Tagged: Windows 7, windows 10, thread scheduling, SMT, ryzen, Robert Hallock, processor, cpu, amd
AMD's Robert Hallock (previously the Head of Global Technical Marketing for AMD and now working full time on the CPU side of things) has posted a comprehensive Ryzen update, covering AMD's official stance on Windows 10 thread scheduling, the performance implications of SMT, Windows power management settings, and more. The post in its entirety is reproduced below, and also available from AMD by following this link.
It’s been about two weeks since we launched the new AMD Ryzen™ processor, and I’m just thrilled to see all the excitement and chatter surrounding our new chip. Seems like not a day goes by when I’m not being tweeted by someone doing a new build, often for the first time in many years. Reports from media and users have also been good:
- “This CPU gives you something that we needed for a long time, which is a CPU that gives you a well-rounded experience.” –JayzTwoCents
- Competitive performance at 1080p, with Tech Spot saying the “affordable Ryzen 7 1700” is an “awesome option” and a “safer bet long term.”
- ExtremeTech showed strong performance for high-end GPUs like the GeForce GTX 1080 Ti, especially for gamers that understand how much value AMD Ryzen™ brings to the table
- Many users are noting that the 8-core design of AMD Ryzen™ 7 processors enables “noticeably SMOOTHER” performance compared to their old platforms.
While these findings have been great to read, we are just getting started! The AMD Ryzen™ processor and AM4 Platform both have room to grow, and we wanted to take a few minutes to address some of the questions and comments being discussed across the web.
We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture.
As an extension of this investigation, we have also reviewed topology logs generated by the Sysinternals Coreinfo utility. We have determined that an outdated version of the application was responsible for originating the incorrect topology data that has been widely reported in the media. Coreinfo v3.31 (or later) will produce the correct results.
Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes.
Going forward, our analysis highlights that there are many applications that already make good use of the cores and threads in Ryzen, and there are other applications that can better utilize the topology and capabilities of our new CPU with some targeted optimizations. These opportunities are already being actively worked via the AMD Ryzen™ dev kit program that has sampled 300+ systems worldwide.
Above all, we would like to thank the community for their efforts to understand the Ryzen processor and reporting their findings. The software/hardware relationship is a complex one, with additional layers of nuance when preexisting software is exposed to an all-new architecture. We are already finding many small changes that can improve the Ryzen performance in certain applications, and we are optimistic that these will result in beneficial optimizations for current and future applications.
The primary temperature reporting sensor of the AMD Ryzen™ processor is a sensor called “T Control,” or tCTL for short. The tCTL sensor is derived from the junction (Tj) temperature—the interface point between the die and heatspreader—but it may be offset on certain CPU models so that all models on the AM4 Platform have the same maximum tCTL value. This approach ensures that all AMD Ryzen™ processors have a consistent fan policy.
Specifically, the AMD Ryzen™ 7 1700X and 1800X carry a +20°C offset between the tCTL° (reported) temperature and the actual Tj° temperature. In the short term, users of the AMD Ryzen™ 1700X and 1800X can simply subtract 20°C to determine the true junction temperature of their processor. No arithmetic is required for the Ryzen 7 1700. Long term, we expect temperature monitoring software to better understand our tCTL offsets to report the junction temperature automatically.
The table below serves as an example of how the tCTL sensor can be interpreted in a hypothetical scenario where a Ryzen processor is operating at 38°C.
Users may have heard that AMD recommends the High Performance power plan within Windows® 10 for the best performance on Ryzen, and indeed we do. We recommend this plan for two key reasons:
- Core Parking OFF: Idle CPU cores are instantaneously available for thread scheduling. In contrast, the Balanced plan aggressively places idle CPU cores into low power states. This can cause additional latency when un-parking cores to accommodate varying loads.
- Fast frequency change: The AMD Ryzen™ processor can alter its voltage and frequency states in the 1ms intervals natively supported by the “Zen” architecture. In contrast, the Balanced plan may take longer for voltage and frequency (V/f) changes due to software participation in power state changes.
In the near term, we recommend that games and other high-performance applications are complemented by the High Performance plan. By the first week of April, AMD intends to provide an update for AMD Ryzen™ processors that optimizes the power policy parameters of the Balanced plan to favor performance more consistent with the typical usage models of a desktop PC.
Simultaneous Multi-threading (SMT)
Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.
For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready.
Overall, we are thrilled with the outpouring of support we’ve seen from AMD fans new and old. We love seeing your new builds, your benchmarks, your excitement, and your deep dives into the nuts and bolts of Ryzen. You are helping us make Ryzen™ even better by the day. You should expect to hear from us regularly through this blog to answer new questions and give you updates on new improvements in the Ryzen ecosystem.
Such topics as Windows 7 vs. Windows 10 performance, SMT impact, and thread scheduling will no doubt still be debated, and AMD has correctly pointed out that optimization for this brand new architecture will only improve Ryzen performance going forward. Our own findings as to Ryzen and the Windows 10 thread scheduler appear to be validated as AMD officially dismisses performance impact in that area, though there is still room for improvement in other areas from our initial gaming performance findings. As mentioned in the post, AMD will have an update for Windows power plan optimization by the first week of April, and the company has "already identified some simple changes that can improve a game’s understanding of the 'Zen' core/cache topology, and we intend to provide a status update to the community when they are ready", as well.
It is refreshing to see a company publicly acknowledging the topics that have resulted in so much discussion in the past couple of weeks, and their transparency is commendable, with every issue (that this author is aware of) being touched on in the post.
** UPDATE 3/13 5 PM **
AMD has posted a follow-up statement that officially clears up much of the conjecture this article was attempting to clarify. Relevant points from their post that relate to this article as well as many of the requests for additional testing we have seen since its posting (emphasis mine):
"We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."
"Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows. Any differences in performance can be more likely attributed to software architecture differences between these OSes."
So there you have it, straight from the horse's mouth. AMD does not believe the problem lies within the Windows thread scheduler. SMT performance in gaming workloads was also addressed:
"Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.
For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready."
We are still digging into the observed differences of toggling SMT compared with disabling the second CCX, but it is good to see AMD issue a clarifying statement here for all of those out there observing and reporting on SMT-related performance deltas.
** END UPDATE **
Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posted that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the logical vs. physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout
Initial reviews of AMD’s Ryzen CPU revealed a few inefficiencies in some situations particularly in gaming workloads running at the more common resolutions like 1080p, where the CPU comprises more of a bottleneck when coupled with modern GPUs. Lots of folks have theorized about what could possibly be causing these issues, and most recent attention appears to have been directed at the Windows 10 scheduler and its supposed inability to properly place threads on the Ryzen cores for the most efficient processing.
I typically have Task Manager open while running storage tests (they are boring to watch otherwise), and I naturally had it open during Ryzen platform storage testing. I’m accustomed to how the IO workers are distributed across reported threads, and in the case of SMT capable CPUs, distributed across cores. There is a clear difference when viewing our custom storage workloads with SMT on vs. off, and it was dead obvious to me that core loading was working as expected while I was testing Ryzen. I went back and pulled the actual thread/core loading data from my testing results to confirm:
The Windows scheduler has a habit of bouncing processes across available processor threads. This naturally happens as other processes share time with a particular core, with the heavier process not necessarily switching back to the same core. As you can see above, the single IO handler thread was spread across the first four cores during its run, but the Windows scheduler was always hitting just one of the two available SMT threads on any single core at one time.
My testing for Ryan’s Ryzen review consisted of only single threaded workloads, but we can make things a bit clearer by loading down half of the CPU while toggling SMT off. We do this by increasing the worker count (4) to be half of the available threads on the Ryzen processor, which is 8 with SMT disabled in the motherboard BIOS.
SMT OFF, 8 cores, 4 workers
With SMT off, the scheduler is clearly not giving priority to any particular core and the work is spread throughout the physical cores in a fairly even fashion.
Now let’s try with SMT turned back on and doubling the number of IO workers to 8 to keep the CPU half loaded:
SMT ON, 16 (logical) cores, 8 workers
With SMT on, we see a very different result. The scheduler is clearly loading only one thread per core. This could only be possible if Windows was aware of the 2-way SMT (two threads per core) configuration of the Ryzen processor. Do note that sometimes the workload will toggle around every few seconds, but the total loading on each physical core will still remain at ~%50. I chose a workload that saturated its thread just enough for Windows to not shift it around as it ran, making the above result even clearer.
Synthetic Testing Procedure
While the storage testing methods above provide a real-world example of the Windows 10 scheduler working as expected, we do have another workload that can help demonstrate core balancing with Intel Core and AMD Ryzen processors. A quick and simple custom-built C++ application can be used to generate generic worker threads and monitor for core collisions and resolutions.
This test app has a very straight forward workflow. Every few seconds it generates a new thread, capping at N/2 threads total, where N is equal to the reported number of logical cores. If the OS scheduler is working as expected, it should load 8 threads across 8 physical cores, though the division between the specific logical core per physical core will be based on very minute parameters and conditions going on in the OS background.
By monitoring the APIC_ID through the CPUID instruction, the first application thread monitors all threads and detects and reports on collisions - when a thread from our app is running on the same core as another thread from our app. That thread also reports when those collisions have been cleared. In an ideal and expected environment where Windows 10 knows the boundaries of physical and logical cores, you should never see more than one thread of a core loaded at the same time.
Click to Enlarge
This screenshot shows our app working on the left and the Windows Task Manager on the right with logical cores labeled. While it may look like all logical cores are being utilized at the same time, in fact they are not. At any given point, only LCore 0 or LCore 1 are actively processing a thread. Need proof? Check out the modified view of the task manager where I copy the graph of LCore 1/5/9/13 over the graph of LCore 0/4/8/12 with inverted colors to aid viewability.
If you look closely, by overlapping the graphs in this way, you can see that the threads migrate from LCore 0 to LCore 1, LCore 4 to LCore 5, and so on. The graphs intersect and fill in to consume ~100% of the physical core. This pattern is repeated for the other 8 logical cores on the right two columns as well.
Running the same application on a Core i7-5960X Haswell-E 8-core processor shows a very similar behavior.
Click to Enlarge
Each pair of logical cores shares a single thread and when thread transitions occur away from LCore N, they migrate perfectly to LCore N+1. It does appear that in this scenario the Intel system is showing a more stable threaded distribution than the Ryzen system. While that may in fact incur some performance advantage for the 5960X configuration, the penalty for intra-core thread migration is expected to be very minute.
The fact that Windows 10 is balancing the 8 thread load specifically between matching logical core pairs indicates that the operating system is perfectly aware of the processor topology and is selecting distinct cores first to complete the work.
Information from this custom application, along with the storage performance tool example above, clearly show that Windows 10 is attempting to balance work on Ryzen between cores in the same manner that we have experienced with Intel and its HyperThreaded processors for many years.
Subject: Motherboards | March 10, 2017 - 02:22 PM | Jeremy Hellstrom
Tagged: Z270 GAMING M6 AC, z270, ryzen, msi, amd
The new MSI Z270 GAMING M6 AC has a huge selection of features, up to and including a free Phanteks RGB LED strip for those who suffer from chronic RGBitis
The add-in card you see on the side is an Intel Wi-Fi/Bluetooth card which supports MU-MIMO. The onboard audio is powered by Nahimic, which MSI refers to as Audio Boost 4 and it is isolated from the other components on the motherboard to prevent noise. There is a U.2 slot and two M.2 slots with a removable heatsink they call M.2 Shield. They fully isolated the memory circuit design and as you can see below The Witcher 3 seems to like the DDR4 Boost design.
Check out the PR below for a closer look at the features included, including the special USB slot for your VR headset and the One-Click to VR option.
MSI, world leading in gaming hardware innovation, is proud to announce a brand new Enthusiast GAMING motherboard, the Z270 GAMING M6 AC with its incredibly versatile and complete foundation for a high-end gaming system. Inspired from a futuristic armored spaceship, the Z270 GAMING M6 AC design with multilayer plating, wings and armaments emphasize an ultramodern style. Erupting from the core, the entire color spectrum flows through illuminated lines. The complete motherboard and heatsink design offers a strong look and feel and uses heavy quality components to deliver the best performance and stability as the base of any gaming rig. Added features such as Audio Boost 4 with Nahimic 2, Twin Turbo M.2 with M.2 Shield, VR Boost, Killer LAN & Intel WIFI AC, and the option to fully customize the RGB LEDs to any color using Mystic light, makes the Z270 GAMING M6 AC one of the most high-end and desirable Z270 motherboards to build a gaming rig with.
Through fully isolating the memory circuit design, the DDR4 Boost ensures maximum performance and stability. The technical enhancements of DDR4 Boost allow for more stability at higher memory speeds compared to other brands.DDR4 Boost benchmark based on The Witcher 3 Enjoy the additional boost in gaming performance or when working with large video and photo files. Enable Intel® Extreme Memory Profile with ease using a single option in the BIOS to gain performance and create a perfectly stable system.
Twin Turbo M.2 with M.2 shield & U.2
Enjoy a blazing fast system boot up and insanely quick loading of applications and games with MSI motherboards. Twin Turbo M.2 delivers PCI-E Gen3 x4 performance with transfer speeds up to 64 Gb/s for the latest SSDs. It also supports the all-new Intel® Optane™ technology. M.2 Shield (patent pending) is a thermal solution, which keeps the M.2 or Optane™ device safe and cool to prevent damage and thermal throttling. M.2 GENIE makes setting up RAID easy by taking less steps, using any M.2 or PCI-E SSD (even when used in a mixed configuration). The Z270 GAMING M6 AC supports the latest storage interface, U.2 as well.
Audio Boost 4 with Nahimic 2
With Audio Boost, powered by Nahimic, MSI motherboards deliver the highest sound quality through the use of premium quality audio components and an isolated audio PCB. An added audio cover and golden audio connectors ensure the purest audio signal.
VR Ready & VR Boost
VR Boost is a smart chip that ensures a clean and strong signal to a VR optimized USB port located on the back, to reduce motion sickness caused by a bad signal. The One-Click to VR option in the MSI Gaming App gets your PC primed for VR use in just a single click by setting your components to max. performance and preventing other applications from impacting your VR experience negatively.
Intel Wi-Fi AC with Antennas
Optimize your gaming rig to deliver game networking traffic over LAN for the best possible online gaming experience, while using WiFi for other online applications. This next-generation Intel® Wi-Fi / Bluetooth solution uses smart MU-MIMO technology, delivering AC speeds up to 867Mbps. Perfect for streaming and gaming at the same time.
Includes free Phanteks RGB LED strip
This RGB LED strip helps to transform and synchronize colors in your case to any liking. Simply connect the plug & play strip to the Mystic Light Extension pin header located on MSI motherboards, without the need of external power, and set a color and choose an LED effect to match it with your motherboard and other peripherals RGB LEDs. Use the included double sided 3M tape to place the strip firmly wherever you want inside (or even outside) your chassis.