Ryzen Locking on Certain FMA3 Workloads

Subject: Processors | March 15, 2017 - 05:51 PM |
Tagged: ryzen, Infinity Fabric, hwbot, FMA3, Control Fabric, bug, amd, AM4

Last week a thread was started at the HWBOT forum and discussed a certain workload that resulted in a hard lock every time it was run.  This was tested with a variety of motherboards and Ryzen processors from the 1700 to the 1800X.  In no circumstance at default power and clock settings did the processor not lock from the samples that they have worked on, as well as products that contributors have been able to test themselves.

ryzen.jpg

This is quite reminiscent of the Coppermine based Pentium III 1133 MHz processor from Intel which failed in one specific workload (compiling).  Intel had shipped a limited number of these CPUs at that time, and it was Kyle from HardOCP and Tom from Tom’s Hardware that were the first to show this behavior in a repeatable environment.  Intel stopped shipping these models and had to wait til the Tualatin version of the Pentium III to be released to achieve that speed (and above) and be stable in all workloads.

The interesting thing about this FMA3 finding is that it is seen to not be present in some overclocked Ryzen chips.  To me this indicates that it could be a power delivery issue with the chip.  A particular workload that heavily leans upon the FPU could require more power than the chip’s Control Fabric can deliver, therefore causing a hard lock.  Several tested overclocked chips with much more power being pushed to them seems as though enough power is being applied to the specific area of the chip to allow the operation to be completed successfully.

This particular fact implies to me that AMD does not necessarily have a bug such as what Intel had with the infamous F-Div issue with the original Pentium, or AMD’s issue with the B2 stepping of Phenom.  AMD has a very complex voltage control system that is controlled by the Control Fabric portion of the Infinity Fabric.  With a potential firmware or microcode update this could be a fixable problem.  If this is the case, then AMD would simply increase power being supplied to the FPU/SIMD/SSE portion of the Ryzen cores.  This may come at a cost through lower burst speeds to keep TDP within their stated envelope.

A source at AMD has confirmed this issue and that a fix will be provided via motherboard firmware update.  More than likely this comes in the form of an updated AGESA protocol.

Source: HWBOT Forums

Today's episode features special guest Denver Jetson

Subject: Processors | March 14, 2017 - 03:17 PM |
Tagged: nvidia, JetsonTX1, Denver, Cortex A57, pascal, SoC

Amongst the furor of the Ryzen launch the NVIDIA's new Jetson TX2 SoC was quietly sent out to reviewers and today the NDA expired so we can see how it performs.  There are more Ryzen reviews below the fold, including Phoronix's Linux testing if you want to skip ahead.  In addition to the specifications in the quote, you will find 8GB of 128-bit LPDDR4 offering memory bandwidth of 58.4 GB/s and 32GBs of eMMC for local storage.  This Jetson is running JetPack 3.0 L4T based off of the Linux 4.4.15 kernel.  Phoronix tested out its performance, see for yourself.

image.php_.jpg

"Last week we got to tell you all about the new NVIDIA Jetson TX2 with its custom-designed 64-bit Denver 2 CPUs, four Cortex-A57 cores, and Pascal graphics with 256 CUDA cores. Today the Jetson TX2 is shipping and the embargo has expired for sharing performance metrics on the JTX2."

Here are some more Processor articles from around the web:

Processors

Source: Phoronix

AMD Ryzen Community Update Addresses Windows 10 Thread Scheduling, SMT Performance, and More

Subject: Processors | March 13, 2017 - 08:48 PM |
Tagged: Windows 7, windows 10, thread scheduling, SMT, ryzen, Robert Hallock, processor, cpu, amd

AMD's Robert Hallock (previously the Head of Global Technical Marketing for AMD and now working full time on the CPU side of things) has posted a comprehensive Ryzen update, covering AMD's official stance on Windows 10 thread scheduling, the performance implications of SMT, Windows power management settings, and more. The post in its entirety is reproduced below, and also available from AMD by following this link.

AMD_RYZEN.png

(Begin statement:)

It’s been about two weeks since we launched the new AMD Ryzen™ processor, and I’m just thrilled to see all the excitement and chatter surrounding our new chip. Seems like not a day goes by when I’m not being tweeted by someone doing a new build, often for the first time in many years. Reports from media and users have also been good:

  • “This CPU gives you something that we needed for a long time, which is a CPU that gives you a well-rounded experience.” –JayzTwoCents
  • Competitive performance at 1080p, with Tech Spot saying the “affordable Ryzen 7 1700” is an “awesome option” and a “safer bet long term.”
  • ExtremeTech showed strong performance for high-end GPUs like the GeForce GTX 1080 Ti, especially for gamers that understand how much value AMD Ryzen™ brings to the table
  • Many users are noting that the 8-core design of AMD Ryzen™ 7 processors enables “noticeably SMOOTHER” performance compared to their old platforms.

While these findings have been great to read, we are just getting started! The AMD Ryzen™ processor and AM4 Platform both have room to grow, and we wanted to take a few minutes to address some of the questions and comments being discussed across the web.

Thread Scheduling

We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture.

As an extension of this investigation, we have also reviewed topology logs generated by the Sysinternals Coreinfo utility. We have determined that an outdated version of the application was responsible for originating the incorrect topology data that has been widely reported in the media. Coreinfo v3.31 (or later) will produce the correct results.

Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows.  Any differences in performance can be more likely attributed to software architecture differences between these OSes.

Going forward, our analysis highlights that there are many applications that already make good use of the cores and threads in Ryzen, and there are other applications that can better utilize the topology and capabilities of our new CPU with some targeted optimizations. These opportunities are already being actively worked via the AMD Ryzen™ dev kit program that has sampled 300+ systems worldwide.

Above all, we would like to thank the community for their efforts to understand the Ryzen processor and reporting their findings. The software/hardware relationship is a complex one, with additional layers of nuance when preexisting software is exposed to an all-new architecture. We are already finding many small changes that can improve the Ryzen performance in certain applications, and we are optimistic that these will result in beneficial optimizations for current and future applications.

Temperature Reporting

The primary temperature reporting sensor of the AMD Ryzen™ processor is a sensor called “T Control,” or tCTL for short. The tCTL sensor is derived from the junction (Tj) temperature—the interface point between the die and heatspreader—but it may be offset on certain CPU models so that all models on the AM4 Platform have the same maximum tCTL value. This approach ensures that all AMD Ryzen™ processors have a consistent fan policy.

Specifically, the AMD Ryzen™ 7 1700X and 1800X carry a +20°C offset between the tCTL° (reported) temperature and the actual Tj° temperature. In the short term, users of the AMD Ryzen™ 1700X and 1800X can simply subtract 20°C to determine the true junction temperature of their processor. No arithmetic is required for the Ryzen 7 1700. Long term, we expect temperature monitoring software to better understand our tCTL offsets to report the junction temperature automatically.

The table below serves as an example of how the tCTL sensor can be interpreted in a hypothetical scenario where a Ryzen processor is operating at 38°C.

TEMPS.png

Power Plans

Users may have heard that AMD recommends the High Performance power plan within Windows® 10 for the best performance on Ryzen, and indeed we do. We recommend this plan for two key reasons: 

  1. Core Parking OFF: Idle CPU cores are instantaneously available for thread scheduling. In contrast, the Balanced plan aggressively places idle CPU cores into low power states. This can cause additional latency when un-parking cores to accommodate varying loads.
  2. Fast frequency change: The AMD Ryzen™ processor can alter its voltage and frequency states in the 1ms intervals natively supported by the “Zen” architecture. In contrast, the Balanced plan may take longer for voltage and frequency (V/f) changes due to software participation in power state changes.

In the near term, we recommend that games and other high-performance applications are complemented by the High Performance plan. By the first week of April, AMD intends to provide an update for AMD Ryzen™ processors that optimizes the power policy parameters of the Balanced plan to favor performance more consistent with the typical usage models of a desktop PC.

Simultaneous Multi-threading (SMT)

Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.

For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready.

Wrap-up

Overall, we are thrilled with the outpouring of support we’ve seen from AMD fans new and old. We love seeing your new builds, your benchmarks, your excitement, and your deep dives into the nuts and bolts of Ryzen. You are helping us make Ryzen™ even better by the day.  You should expect to hear from us regularly through this blog to answer new questions and give you updates on new improvements in the Ryzen ecosystem.

(End statement.)

Such topics as Windows 7 vs. Windows 10 performance, SMT impact, and thread scheduling will no doubt still be debated, and AMD has correctly pointed out that optimization for this brand new architecture will only improve Ryzen performance going forward. Our own findings as to Ryzen and the Windows 10 thread scheduler appear to be validated as AMD officially dismisses performance impact in that area, though there is still room for improvement in other areas from our initial gaming performance findings. As mentioned in the post, AMD will have an update for Windows power plan optimization by the first week of April, and the company has "already identified some simple changes that can improve a game’s understanding of the 'Zen' core/cache topology, and we intend to provide a status update to the community when they are ready", as well.

It is refreshing to see a company publicly acknowledging the topics that have resulted in so much discussion in the past couple of weeks, and their transparency is commendable, with every issue (that this author is aware of) being touched on in the post.

Source: AMD
Manufacturer: RockIt Cool

Introduction

Introduction

With the introduction of the Intel Kaby Lake processors and Intel Z270 chipset, unprecedented overclocking became the norm. The new processors easily hit a core speed of 5.0GHz with little more than CPU core voltage tweaking. This overclocking performance increase came with a price tag. The Kaby Lake processor runs significantly hotter than previous generation processors, a seeming reversal in temperature trends from previous generation Intel CPUs. At stock settings, the individual cores in the CPU were recording in testing at hitting up to 65C - and that's with a high performance water loop cooling the processor. Per reports from various enthusiasts sites, Intel used inferior TIM (thermal interface material) in between the CPU die and underside of the CPU heat spreader, leading to increased temperatures when compared with previous CPU generations (in particular Skylake). This temperature increase did not affect overclocking much since the CPU will hit 5.0GHz speed easily, but does impact the means necessary to hit those performance levels.

Like with the previous generation Haswell CPUs, a few of the more adventurous enthusiasts used known methods in an attempt to address the heat concerns of the Kaby Lake processor be delidding the processor. Unlike in the initial days of the Haswell processor, the delidding process is much more stream-lined with the availability of delidding kits from several vendors. The delidding process still involves physically removing the heat spreader from the CPU, and exposing the CPU die. However, instead of cooling the die directly, the "safer" approach is to clean the die and underside of the heat spreader, apply new TIM (thermal interface material), and re-affix the heat spreader to the CPU. Going this route instead of direct-die cooling is considered safer because no additional or exotic support mechanisms are needed to keep the CPU cooler from crushing your precious die. However, calling it safe is a bit of an over-statement, you are physically separating the heat spreader from the CPU surface and voiding your CPU warranty at the same time. Although if that was a concern, you probably wouldn't be reading this article in the first place.

Continue reading our Kaby Lake Relidding article!

NVIDIA Launches Jetson TX2 With Pascal GPU For Embedded Devices

Subject: General Tech, Processors | March 12, 2017 - 05:11 PM |
Tagged: pascal, nvidia, machine learning, iot, Denver, Cortex A57, ai

NVIDIA recently unveiled the Jetson TX2, a credit card sized compute module for embedded devices that has been upgraded quite a bit from its predecessor (the aptly named TX1).

jx10-jetson-tx2-170203.jpg

Measuring 50mm x 87mm, the Jetson TX2 packs quite a bit of processing power and I/O including an SoC with two 64-bit Denver 2 cores with 2MB L2, four ARM Cortex A57 cores with 2MB L2, and a 256-core GPU based on NVIDIA’s Pascal architecture. The TX2 compute module also hosts 8 GB of LPDDR4 (58.3 GB/s) and 32 GB of eMMC storage (SDIO and SATA are also supported). As far as I/O, the Jetson TX2 uses a 400-pin connector to connect the compute module to the development board or final product and the final I/O available to users will depend on the product it is used in. The compute module supports up to the following though:

  • 2 x DSI
  • 2 x DP 1.2 / HDMI 2.0 / eDP 1.4
  • USB 3.0
  • USB 2.0
  • 12 x CSI lanes for up to 6 cameras (2.5 GB/second/lane)
  • PCI-E 2.0:
    • One x4 + one x1 or two x1 + one x2
  • Gigabit Ethernet
  • 802.11ac
  • Bluetooth

 

The Jetson TX2 runs the “Linux for Tegra” operating system. According to NVIDIA the Jetson TX2 can deliver up to twice the performance of the TX1 or up to twice the efficiency at 7.5 watts at the same performance.

The extra horsepower afforded by the faster CPU, updated GPU, and increased memory and memory bandwidth will reportedly enable smart end user devices with faster facial recognition, more accurate speech recognition, and smarter AI and machine learning tasks (e.g. personal assistant, smart street cameras, smarter home automation, et al). Bringing more power locally to these types of internet of things devices is a good thing as less reliance on the cloud potentially means more privacy (unfortunately there is not as much incentive for companies to make this type of product for the mass market but you could use the TX2 to build your own).

Cisco will reportedly use the Jetson TX2 to add facial and speech recognition to its Cisco Spark devices. In addition to the hardware, NVIDIA offers SDKs and tools as part of JetPack 3.0. The JetPack 3.0 toolkit includes Tensor-RT, cuDNN 5.1, VisionWorks 1.6, CUDA 8, and support and drivers for OpenGL 4.5, OpenGL ES 3 2, EGL 1.4, and Vulkan 1.0.

The TX2 will enable better, stronger, and faster (well I don't know about stronger heh) industrial control systems, robotics, home automation, embedded computers and kiosks, smart signage, security systems, and other connected IoT devices (that are for the love of all processing are hardened and secured so they aren't used as part of a botnet!).

Interested developers and makers can pre-order the Jetson TX2 Development Kit for $599 with a ship date for US and Europe of March 14 and other regions “in the coming weeks.” If you just want the compute module sans development board, it will be available later this quarter for $399 (in quantities of 1,000 or more). The previous generation Jetson TX1 Development Kit has also received a slight price cut to $499.

Also read:

Source: NVIDIA

SoftBank Plans To Sell 25% of ARM Holdings To Vision Fund

Subject: General Tech, Processors | March 11, 2017 - 10:02 PM |
Tagged: softbank, investments, business, arm

Japanese telecom powerhouse SoftBank, which recently purchased ARM Holdings for $32 billion USD is reportedly in talks to sell off a 25% stake in its new subsidiary to a new investment fund. Specifically, the New York Times cites a source inside SoftBank familiar with the matter who revealed that SoftBank is in talks with the Vision Fund to purchase a stake in ARM Holdings worth approximately $8 billion USD.

ARM at a glance.png

The $100 billion Vision Fund is an investment fund started by SoftBank founder Masayoshi Son with a goal of investing in high growth technology start-ups and major technology IP holders. The fund is currently comprised of investments from SoftBank worth $25 billion, $45 billion from Saudi Arabia (via Saudi Arabia Public Investment Fund), and minor investments from Apple and Oracle co-founder Lawrence Ellison. The fund is approximately 75% of the way to its $100 billion funding goal with the state owned Mubadala Development investment company in Abu Dhabi and the Qatari government allegedly interested in joining the fund. The Vision Fund is based in the UK and led by SoftBank's Head of Strategic Finance Rajeev Mistra (Investment bankers Nizar al-Bassam and Dalin Ariburnu formerly of Deutsche Bank and Goldman Sachs respectively are also involved.)

It is interesting that SoftBank plans to sell off such a large stake in ARM Holdings so soon after purchasing the company (the sale finalized only six months ago), but it may be a move to entice investors to the investment fund which SoftBank is a part of to further diversify its assets. The more interesting question is the political and regulatory reaction to this news and what it will mean for ARM and its IP to have even more countries controlling it and its direction(s). I do not have the geopolitical acumen to speculate on whether this is a good or bad thing (heh). It does continue the trend of countries outside of the US increasing their investments in established technology companies with lots of IP (wether US based or not) as well as new start ups. New money entering this sector is likely overall good though, at least for the companies involved heh.

I guess we will just have to wait and see if the sale completes and where ARM goes from there! What are your thoughts on the SoftBank sale of a quarter stake in ARM?

Subject: Processors
Manufacturer: AMD

** UPDATE 3/13 5 PM **

AMD has posted a follow-up statement that officially clears up much of the conjecture this article was attempting to clarify. Relevant points from their post that relate to this article as well as many of the requests for additional testing we have seen since its posting (emphasis mine):

  • "We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."

  • "Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows.  Any differences in performance can be more likely attributed to software architecture differences between these OSes."

So there you have it, straight from the horse's mouth. AMD does not believe the problem lies within the Windows thread scheduler. SMT performance in gaming workloads was also addressed:

  • "Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.

    For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready."

We are still digging into the observed differences of toggling SMT compared with disabling the second CCX, but it is good to see AMD issue a clarifying statement here for all of those out there observing and reporting on SMT-related performance deltas.

** END UPDATE **

Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posted that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the logical vs. physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout

Initial reviews of AMD’s Ryzen CPU revealed a few inefficiencies in some situations particularly in gaming workloads running at the more common resolutions like 1080p, where the CPU comprises more of a bottleneck when coupled with modern GPUs. Lots of folks have theorized about what could possibly be causing these issues, and most recent attention appears to have been directed at the Windows 10 scheduler and its supposed inability to properly place threads on the Ryzen cores for the most efficient processing. 

I typically have Task Manager open while running storage tests (they are boring to watch otherwise), and I naturally had it open during Ryzen platform storage testing. I’m accustomed to how the IO workers are distributed across reported threads, and in the case of SMT capable CPUs, distributed across cores. There is a clear difference when viewing our custom storage workloads with SMT on vs. off, and it was dead obvious to me that core loading was working as expected while I was testing Ryzen. I went back and pulled the actual thread/core loading data from my testing results to confirm:

SMT on usage.png

The Windows scheduler has a habit of bouncing processes across available processor threads. This naturally happens as other processes share time with a particular core, with the heavier process not necessarily switching back to the same core. As you can see above, the single IO handler thread was spread across the first four cores during its run, but the Windows scheduler was always hitting just one of the two available SMT threads on any single core at one time.

My testing for Ryan’s Ryzen review consisted of only single threaded workloads, but we can make things a bit clearer by loading down half of the CPU while toggling SMT off. We do this by increasing the worker count (4) to be half of the available threads on the Ryzen processor, which is 8 with SMT disabled in the motherboard BIOS.

smtoff4workers.png

SMT OFF, 8 cores, 4 workers

With SMT off, the scheduler is clearly not giving priority to any particular core and the work is spread throughout the physical cores in a fairly even fashion.

Now let’s try with SMT turned back on and doubling the number of IO workers to 8 to keep the CPU half loaded:

smton8workers.png

SMT ON, 16 (logical) cores, 8 workers

With SMT on, we see a very different result. The scheduler is clearly loading only one thread per core. This could only be possible if Windows was aware of the 2-way SMT (two threads per core) configuration of the Ryzen processor. Do note that sometimes the workload will toggle around every few seconds, but the total loading on each physical core will still remain at ~%50. I chose a workload that saturated its thread just enough for Windows to not shift it around as it ran, making the above result even clearer.

Synthetic Testing Procedure

While the storage testing methods above provide a real-world example of the Windows 10 scheduler working as expected, we do have another workload that can help demonstrate core balancing with Intel Core and AMD Ryzen processors. A quick and simple custom-built C++ application can be used to generate generic worker threads and monitor for core collisions and resolutions.

This test app has a very straight forward workflow. Every few seconds it generates a new thread, capping at N/2 threads total, where N is equal to the reported number of logical cores. If the OS scheduler is working as expected, it should load 8 threads across 8 physical cores, though the division between the specific logical core per physical core will be based on very minute parameters and conditions going on in the OS background.

By monitoring the APIC_ID through the CPUID instruction, the first application thread monitors all threads and detects and reports on collisions - when a thread from our app is running on the same core as another thread from our app. That thread also reports when those collisions have been cleared. In an ideal and expected environment where Windows 10 knows the boundaries of physical and logical cores, you should never see more than one thread of a core loaded at the same time.

app01.png

Click to Enlarge

This screenshot shows our app working on the left and the Windows Task Manager on the right with logical cores labeled. While it may look like all logical cores are being utilized at the same time, in fact they are not. At any given point, only LCore 0 or LCore 1 are actively processing a thread. Need proof? Check out the modified view of the task manager where I copy the graph of LCore 1/5/9/13 over the graph of LCore 0/4/8/12 with inverted colors to aid viewability.

app02-2.png

If you look closely, by overlapping the graphs in this way, you can see that the threads migrate from LCore 0 to LCore 1, LCore 4 to LCore 5, and so on. The graphs intersect and fill in to consume ~100% of the physical core. This pattern is repeated for the other 8 logical cores on the right two columns as well. 

Running the same application on a Core i7-5960X Haswell-E 8-core processor shows a very similar behavior.

app03.png

Click to Enlarge

Each pair of logical cores shares a single thread and when thread transitions occur away from LCore N, they migrate perfectly to LCore N+1. It does appear that in this scenario the Intel system is showing a more stable threaded distribution than the Ryzen system. While that may in fact incur some performance advantage for the 5960X configuration, the penalty for intra-core thread migration is expected to be very minute.

The fact that Windows 10 is balancing the 8 thread load specifically between matching logical core pairs indicates that the operating system is perfectly aware of the processor topology and is selecting distinct cores first to complete the work.

Information from this custom application, along with the storage performance tool example above, clearly show that Windows 10 is attempting to balance work on Ryzen between cores in the same manner that we have experienced with Intel and its HyperThreaded processors for many years.

Continue reading our look at AMD Ryzen and Windows 10 scheduling!

Author:
Subject: Processors
Manufacturer: AMD

The right angle

While many in the media and enthusiast communities are still trying to fully grasp the importance and impact of the recent AMD Ryzen 7 processor release, I have been trying to complete my review of the 1700X and 1700 processors, in between testing the upcoming GeForce GTX 1080 Ti and preparing for more hardware to show up at the offices very soon. There is still much to learn and understand about the first new architecture from AMD in nearly a decade, including analysis of the memory hierarchy, power consumption, overclocking, gaming performance, etc.

During my Ryzen 7 1700 testing, I went through some overclocking evaluation and thought the results might be worth sharing earlier than later. This quick article is just a preview of what we are working on so don’t expect to find the answers to Ryzen power management here, only a recounting of how I was able to get stellar performance from the lowest priced Ryzen part on the market today.

The system specifications for this overclocking test were identical to our original Ryzen 7 processor review.

Test System Setup
CPU AMD Ryzen 7 1800X
AMD Ryzen 7 1700X
AMD Ryzen 7 1700
Intel Core i7-7700K
Intel Core i5-7600K
Intel Core i7-6700K
Intel Core i7-6950X
Intel Core i7-6900K
Intel Core i7-6800K
Motherboard ASUS Crosshair VI Hero (Ryzen)
ASUS Prime Z270-A (Kaby Lake, Skylake)
ASUS X99-Deluxe II (Broadwell-E)
Memory 16GB DDR4-2400
Storage Corsair Force GS 240 SSD
Sound Card On-board
Graphics Card NVIDIA GeForce GTX 1080 8GB
Graphics Drivers NVIDIA 378.49
Power Supply Corsair HX1000
Operating System Windows 10 Pro x64

Of note is that I am still utilizing the Noctua U12S cooler that AMD provided for our initial testing – all of the overclocking and temperature reporting in this story is air cooled.

DSC02643.jpg

First, let’s start with the motherboard. All of this testing was done on the ASUS Crosshair VI Hero with the latest 5704 BIOS installed. As I began to discover the different overclocking capabilities (BCLK adjustment, multipliers, voltage) I came across one of the ASUS presets. These presets offer pre-defined collections of settings that ASUS feels will offer simple overclocking capabilities. An option for higher BCLK existed but the one that caught my eye was straight forward – 4.0 GHz.

asusbios.jpg

With the Ryzen 1700 installed, I thought I would give it a shot. Keep in mind that this processor has a base clock of 3.0 GHz, a rated maximum boost clock of 3.7 GHz, and is the only 65-watt TDP variant of the three Ryzen 7 processors released last week. Because of that, I didn’t expect the overclocking capability for it to match what the 1700X and 1800X could offer. Based on previous processor experience, when a chip is binned at a lower power draw than the rest of a family it will often have properties that make it disadvantageous for running at HIGHER power. Based on my results here, that doesn’t seem to the case.

4.0.PNG

By simply enabling that option in the ASUS UEFI and rebooting, our Ryzen 1700 processor was running at 4.0 GHz on all cores! For this piece, I won’t be going into the drudge and debate on what settings ASUS changed to get to this setting or if the voltages are overly aggressive – the point is that it just works out of the box.

Continue reading our look at overclocking the new Ryzen 7 1700 processor!

Two CPUs of plump juicy Ryzens

Subject: Processors | March 8, 2017 - 02:43 PM |
Tagged: Ryzen 1700X, Ryzen 1700, amd

With suggested prices of $330 for the Ryzen 1700 and $400 for the 1700X, a lot of users are more curious about the performance of these two chips, especially with some sites reporting almost equal performance when these chips are overclocked.  [H]ard|OCP tested both of these chips at the same clock speeds to see what performance differences there are between the two.  As it turns out the only test which resulted in delta of 1% or more was WinRAR, all other tests showed a minuscule difference between the X and the plain old 1700.  They are going to follow these findings up with more tests, once they source some CPUs from retail outlets to see if there are any differences there.

"So there has been a lot of talk about what Ryzen CPU do you buy? The way I think is that you want to buy the least expensive one that will give you the best performance. That is exactly what we expect to find out here today. Is the Ryzen 1700 for $330 as good as the $400 1700X, or even the $500 1800X? "

Here are some more Processor articles from around the web:

Processors

Source: [H]ard|OCP

AMD Prepares Zen-Based "Naples" Server SoC For Q2 Launch

Subject: Processors | March 7, 2017 - 09:02 AM |
Tagged: SoC, server, ryzen, opteron, Naples, HPC, amd

Over the summer, AMD introduced its Naples platform which is the server-focused implementation of the Zen microarchitecture in a SoC (System On a Chip) package. The company showed off a prototype dual socket Naples system and bits of information leaked onto the Internet, but for the most part news has been quiet on this front (whereas there were quite a few leaks of Ryzen which is AMD's desktop implementation of Zen).

The wait seems to be finally over, and AMD appears ready to talk more about Naples which will reportedly launch in the second quarter of this year (Q2'17) with full availability of processors and motherboards from OEMs and channel partners (e.g. system integrators) happening in the second half of 2017. Per AMD, "Naples" processors are SoCs with 32 cores and 64 threads that support 8 memory channels and a (theoretical) maximum of 2TB DDR4-2667. (Using the 16GB DIMMs available today, Naples support 256GB of DDR4 per socket.) Further, the Naples SoC features 64 PCI-E 3.0 lanes. Rumors also indicated that the SoC included support for sixteen 10GbE interfaces, but AMD has yet to confirm this or the number of SATA/SAS ports offered. AMD did say that Naples has an optimized cache structure for HPC compute and "dedicated security hardware" though it did not go into specifics. (The security hardware may be similar to the ARM TrustZone technology it has used in the past.) 

AMD Naples.jpg

Naples will be offered in single and dual socket designs with dual socket systems offering up 64 cores, 128 threads, 32 DDR4 DIMMs (512 GB using 16 GB modules) on 16 total memory channels with 21.3 GB/s per channel bandwidth (170.7 GB/s per SoC), 128 PCI-E 3.0 lanes, and an AMD Infinity Fabric interconnect between the two processor sockets.

AMD claims that its Naples platform offers up to 45% more cores, 122% more memory bandwidth, and 60% more I/O than its competition. For its internal comparison, AMD chose the Intel Xeon E5-2699A V4 which is the processor with highest core count that is intended for dual socket systems (there are E7s with more cores but those are in 4P systems). The Intel Xeon E5-2699A V4 system is a 14nm 22 core (44 thread) processor clocked at 2.4 GHz base to 3.6 GHz turbo with 55MB cache. It supports four channels of DDR4-2400 for a maximum bandwidth of 76.8 GB/s (19.2 GB/s per channel) as well as 40 PCI-E 3.0 lanes. A dual socket system with two of those Xeons features 44 cores, 88 threads, and a theoretical maximum of 1.54 TB of ECC RAM.

AMD's reference platform with two 32 core Naples SoCs and 512 GB DDR4 2400 MHz was purportedly 2.5x faster at the seismic analysis workload than the dual Xeon E5-2699A V4 OEM system with 1866 MHz DDR4. Curiously, when AMD compared a Naples reference platform with 44 cores enabled and running 1866 MHz memory to a similarly configured Intel system the Naples platform was twice as fast. It seems that the increased number of memory channels and memory bandwidth are really helping the Naples platform pull ahead in this workload.

AMD Naples and Radeon Instinct.png

The company also intends Naples to power machine learning and AI projects with servers that feature Naples processors and Radeon Instinct graphics processors.

AMD further claims that its Naples platform is more balanced and suited to cloud computing and scientific and HPC workloads than the competition. Specifically, Forrest Norrod the Senior Vice president and General Manager of AMD's Enterprise, Embedded, and Semi-Custom Business Unit stated:

“’Naples’ represents a completely new approach to supporting the massive processing requirements of the modern datacenter. This groundbreaking system-on-chip delivers the unique high-performance features required to address highly virtualized environments, massive data sets and new, emerging workloads.”

There is no word on pricing yet, but it should be competitive with Intel's offerings (the E5-2699A V4 is $4,938). AMD will reportedly be talking data center strategy and its upcoming products during the Open Compute Summit later this week, so hopefully there will be more information released at those presentations.

(My opinions follow)

This is one area where AMD needs to come out strong with support from motherboard manufacturers, system integrators, OEM partners, and OS and software validation to succeed. Intel is not likely to take AMD encroaching on its lucrative server market share lightly, and AMD is going to have a long road ahead of it to regain the market share it once had in this area, but it does have a decent architecture on its hands to build off of with Zen and if it can secure partner support Intel is certainly going to have competition here that it has not had to face in a long time. Intel and AMD competing over the data center market is a good thing, and as both companies bring new technology to market it will trickle down into the consumer level hardware. Naples' success in the data center could mean a profitable AMD with R&D money to push Zen as far as it can – so hopefully they can pull it off.

What are your thoughts on the Naples SoC and AMD's push into the server market?

Also read:

Source: AMD