Author:
Subject: Processors
Manufacturer: AMD

Just a little taste

In a surprise move with no real indication as to why, AMD has decided to reveal some of the most exciting and interesting information surrounding Threadripper and Ryzen 3, both due out in just a few short weeks. AMD CEO Lisa Su and CVP of Marketing John Taylor (along with guest star Robert Hallock) appear in a video being launched on the AMD YouTube website today to divulge the naming, clock speeds and pricing for the new flagship HEDT product line under the Ryzen brand.

people.jpg

We already know a lot of about Threadripper, AMD’s answer to the X299/X99 high-end desktop platforms from Intel, including that they would be coming this summer, have up to 16-cores and 32-threads of compute, and that they would all include 64 lanes of PCI Express 3.0 for a massive amount of connectivity for the prosumer.

Now we know that there will be two models launching and available in early August: the Ryzen Threadripper 1920X and the Ryzen Threadripper 1950X.

  Core i9-7980XE Core i9-7960X Core i9-7940X Core i9-7920X Core i9-7900X Core i7-7820X Core i7-7800X Threadripper 1950X Threadripper 1920X
Architecture Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Zen Zen
Process Tech 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm 14nm
Cores/Threads 18/36 16/32 14/28 12/24 10/20 8/16 6/12 16/32 12/24
Base Clock ? ? ? ? 3.3 GHz 3.6 GHz 3.5 GHz 3.4 GHz 3.5 GHz
Turbo Boost 2.0 ? ? ? ? 4.3 GHz 4.3 GHz 4.0 GHz 4.0 GHz 4.0 GHz
Turbo Boost Max 3.0 ? ? ? ? 4.5 GHz 4.5 GHz N/A N/A N/A
Cache 16.5MB (?) 16.5MB (?) 16.5MB (?) 16.5MB (?) 13.75MB 11MB 8.25MB 40MB ?
Memory Support ? ? ? ? DDR4-2666
Quad Channel
DDR4-2666
Quad Channel
DDR4-2666
Quad Channel
DDR4-2666
Quad Channel
DDR4-2666 Quad Channel
PCIe Lanes ? ? ? ? 44 28 28 64 64
TDP 165 watts (?) 165 watts (?) 165 watts (?) 165 watts (?) 140 watts 140 watts 140 watts 180 watts 180 watts
Socket 2066 2066 2066 2066 2066 2066 2066 TR4 TR4
Price $1999 $1699 $1399 $1199 $999 $599 $389 $999 $799

 

  Threadripper 1950X Threadripper 1920X Ryzen 7 1800X Ryzen 7 1700X Ryzen 7 1700 Ryzen 5 1600X Ryzen 5 1600 Ryzen 5 1500X Ryzen 5 1400
Architecture Zen Zen Zen Zen Zen Zen Zen Zen Zen
Process Tech 14nm 14nm 14nm 14nm 14nm 14nm 14nm 14nm 14nm
Cores/Threads 16/32 12/24 8/16 8/16 8/16 6/12 6/12 4/8 4/8
Base Clock 3.4 GHz 3.5 GHz 3.6 GHz 3.4 GHz 3.0 GHz 3.6 GHz 3.2 GHz 3.5 GHz 3.2 GHz
Turbo/Boost Clock 4.0 GHz 4.0 GHz 4.0 GHz 3.8  GHz 3.7 GHz 4.0 GHz 3.6  GHz 3.7 GHz 3.4 GHz
Cache 40MB ? 20MB 20MB 20MB 16MB 16MB 16MB 8MB
Memory Support DDR4-2666
Quad Channel
DDR4-2666 Quad Channel DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
PCIe Lanes 64 64 20 20 20 20 20 20 20
TDP 180 watts 180 watts 95 watts 95 watts 65 watts 95 watts 65 watts 65 watts 65 watts
Socket TR4 TR4 AM4 AM4 AM4 AM4 AM4 AM4 AM4
Price $999 $799 $499 $399 $329 $249 $219 $189 $169

Continue reading about the announcement of the Ryzen Threadripper and Ryzen 3 processors!

Author:
Subject: Processors
Manufacturer: Intel

A massive lineup

The amount and significance of the product and platform launches occurring today with the Intel Xeon Scalable family is staggering. Intel is launching more than 50 processors and 7 chipsets falling under the Xeon Scalable product brand, targeting data centers and enterprise customers in a wide range of markets and segments. From SMB users to “Super 7” data center clients, the new lineup of Xeon parts is likely to have an option targeting them.

All of this comes at an important point in time, with AMD fielding its new EPYC family of processors and platforms, for the first time in nearly a decade becoming competitive in the space. That decade of clear dominance in the data center has been good to Intel, giving it the ability to bring in profits and high margins without the direct fear of a strong competitor. Intel did not spend those 10 years flat footed though, and instead it has been developing complimentary technologies including new Ethernet controllers, ASICs, Omni-Path, FPGAs, solid state storage tech and much more.

cpus.jpg

Our story today will give you an overview of the new processors and the changes that Intel’s latest Xeon architecture offers to business customers. The Skylake-SP core has some significant upgrades over the Broadwell design before it, but in other aspects the processors and platforms will be quite similar. What changes can you expect with the new Xeon family?

01-11 copy.jpg

Per-core performance has been improved with the updated Skylake-SP microarchitecture and a new cache memory hierarchy that we had a preview of with the Skylake-X consumer release last month. The memory and PCIe interfaces have been upgraded with more channels and more lanes, giving the platform more flexibility for expansion. Socket-level performance also goes up with higher core counts available and the improved UPI interface that makes socket to socket communication more efficient. AVX-512 doubles the peak FLOPS/clock on Skylake over Broadwell, beneficial for HPC and analytics workloads. Intel QuickAssist improves cryptography and compression performance to allow for faster connectivity implementation. Security and agility get an upgrade as well with Boot Guard, RunSure, and VMD for better NVMe storage management. While on the surface this is a simple upgrade, there is a lot that gets improved under the hood.

01-12 copy.jpg

We already had a good look at the new mesh architecture used for the inter-core component communication. This transition away from the ring bus that was in use since Nehalem gives Skylake-SP a couple of unique traits: slightly longer latencies but with more consistency and room for expansion to higher core counts.

01-18 copy.jpg

Intel has changed the naming scheme with the Xeon Scalable release, moving away from “E5/E7” and “v4” to a Platinum, Gold, Silver, Bronze nomenclature. The product differentiation remains much the same, with the Platinum processors offering the highest feature support including 8-sockets, highest core counts, highest memory speeds, connectivity options and more. To be clear: there are a lot of new processors and trying to create an easy to read table of features and clocks is nearly impossible. The highlights of the different families are:

  • Xeon Platinum (81xx)
    • Up to 28 cores
    • Up to 8 sockets
    • Up to 3 UPI links
    • 6-channel DDR4-2666
    • Up to 1.5TB of memory
    • 48 lanes of PCIe 3.0
    • AVX-512 with 2 FMA per core
  • Xeon Gold (61xx)
    • Up to 22 cores
    • Up to 4 sockets
    • Up to 3 UPI links
    • 6-channel DDR4-2666
    • AVX-512 with 2 FMA per core
  • Xeon Gold (51xx)
    • Up to 14 cores
    • Up to 2 sockets
    • 2 UPI links
    • 6-channel DDR4-2400
    • AVX-512 with 1 FMA per core
  • Xeon Silver (41xx)
    • Up to 12 cores
    • Up to 2 sockets
    • 2 UPI links
    • 6-channel DDR4-2400
    • AVX-512 with 1 FMA per core
  • Xeon Bronze (31xx)
    • Up to 8 cores
    • Up to 2 sockets
    • 2 UPI links
    • No Turbo Boost
    • 6-channel DDR4-2133
    • AVX-512 with 1 FMA per core

That’s…a lot. And it only gets worse when you start to look at the entire SKU lineup with clocks, Turbo Speeds, cache size differences, etc. It’s easy to see why the simplicity argument that AMD made with EPYC is so attractive to an overwhelmed IT department.

01-20 copy.jpg

Two sub-categories exist with the T or F suffix. The former indicates a 10-year life cycle (thermal specific) while the F is used to indicate units that integrate the Omni-Path fabric on package. M models can address 1.5TB of system memory. This diagram above, which you should click to see a larger view, shows the scope of the Xeon Scalable launch in a single slide. This release offers buyers flexibility but at the expense of complexity of configuration.

Continue reading about the new Intel Xeon Scalable Skylake-SP platform!

Author:
Subject: Processors
Manufacturer: AMD

EPYC makes its move into the data center

Because we traditionally focus and feed on the excitement and build up surrounding consumer products, the AMD Ryzen 7 and Ryzen 5 launches were huge for us and our community. Finally seeing competition to Intel’s hold on the consumer market was welcome and necessary to move the industry forward, and we are already seeing the results of some of that with this week’s Core i9 release and pricing. AMD is, and deserves to be, proud of these accomplishments. But from a business standpoint, the impact of Ryzen on the bottom line will likely pale in comparison to how EPYC could fundamentally change the financial stability of AMD.

AMD EPYC is the server processor that takes aim at the Intel Xeon and its dominant status on the data center market. The enterprise field is a high margin, high profit area and while AMD once had significant share in this space with Opteron, that has essentially dropped to zero over the last 6+ years. AMD hopes to use the same tactic in the data center as they did on the consumer side to shock and awe the industry into taking notice; AMD is providing impressive new performance levels while undercutting the competition on pricing.

Introducing the AMD EPYC 7000 Series

Targeting the single and 2-socket systems that make up ~95% of the market for data centers and enterprise, AMD EPYC is smartly not trying to swing over its weight class. This offers an enormous opportunity for AMD to take market share from Intel with minimal risk.

epyc-13.jpg

Many of the specifications here have been slowly shared by AMD over time, including at the recent financial analyst day, but seeing it placed on a single slide like this puts everything in perspective. In a single socket design, servers will be able to integrate 32 cores with 64 threads, 8x DDR4 memory channels with up to 2TB of memory capacity per CPU, 128 PCI Express 3.0 lanes for connectivity, and more.

Worth noting on this slide, and was originally announced at the financial analyst day as well, is AMD’s intent to maintain socket compatibility going forward for the next two generations. Both Rome and Milan, based on 7nm technology, will be drop-in upgrades for customers buying into EPYC platforms today. That kind of commitment from AMD is crucial to regain the trust of a market that needs those reassurances.

epyc-14.jpg

Here is the lineup as AMD is providing it for us today. The model numbers in the 7000 series use the second and third characters as a performance indicator (755x will be faster than 750x, for example) and the fourth character to indicate the generation of EPYC (here, the 1 indicates first gen). AMD has created four different core count divisions along with a few TDP options to help provide options for all types of potential customers. It is worth noting that though this table might seem a bit intimidating, it is drastically more efficient when compared to the Intel Xeon product line that exists today, or that will exist in the future.  AMD is offering immediate availability of the top five CPUs in this stack, with the bottom four due before the end of July.

Continue reading about the AMD EPYC data center processor!

Author:
Subject: Processors
Manufacturer: Intel

Specifications and Design

Intel is at an important crossroads for its consumer product lines. Long accused of ignoring the gaming and enthusiast markets, focusing instead on laptops and smartphones/tablets at the direct expense of the DIY user, Intel had raised prices and only shown limited ability to increase per-die performance over a fairly extended period. The release of the AMD Ryzen processor, along with the pending release of the Threadripper product line with up to 16 cores, has moved Intel into a higher gear; they are more prepared to increase features, performance, and lower prices now.

We have already talked about the majority of the specifications, pricing, and feature changes of the Core i9/Core i7 lineup with the Skylake-X designation, but it is worth including them here, again, in our review of the Core i9-7900X for reference purposes.

  Core i9-7980XE Core i9-7960X Core i9-7940X Core i9-7920X Core i9-7900X Core i7-7820X Core i7-7800X Core i7-7740X Core i5-7640X
Architecture Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Kaby Lake-X Kaby Lake-X
Process Tech 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+
Cores/Threads 18/36 16/32 14/28 12/24 10/20 8/16 6/12 4/8 4/4
Base Clock ? ? ? ? 3.3 GHz 3.6 GHz 3.5 GHz 4.3 GHz 4.0 GHz
Turbo Boost 2.0 ? ? ? ? 4.3 GHz 4.3 GHz 4.0 GHz 4.5 GHz 4.2 GHz
Turbo Boost Max 3.0 ? ? ? ? 4.5 GHz 4.5 GHz N/A N/A N/A
Cache 16.5MB (?) 16.5MB (?) 16.5MB (?) 16.5MB (?) 13.75MB 11MB 8.25MB 8MB 6MB
Memory Support ? ? ? ? DDR4-2666
Quad Channel
DDR4-2666
Quad Channel
DDR4-2666
Quad Channel
DDR4-2666
Dual Channel
DDR4-2666 Dual Channel
PCIe Lanes ? ? ? ? 44 28 28 16 16
TDP 165 watts (?) 165 watts (?) 165 watts (?) 165 watts (?) 140 watts 140 watts 140 watts 112 watts 112 watts
Socket 2066 2066 2066 2066 2066 2066 2066 2066 2066
Price $1999 $1699 $1399 $1199 $999 $599 $389 $339 $242

There is a lot to take in here. The three most interesting points are that, one, Intel plans to one-up AMD Threadripper by offering an 18-core processor. Two, which is potentially more interesting, is that it also wants to change the perception of the X299-class platform by offering lower price, lower core count CPUs like the quad-core, non-HyperThreaded Core i5-7640X. Third, we also see the first ever branding of Core i9.

Intel only provided detailed specifications up to the Core i9-7900X, which is a 10-core / 20-thread processor that has a base clock of 3.3 GHz and a Turbo peak of 4.5 GHz (using the new Turbo Boost Max Technology 3.0). It sports 13.75MB of cache thanks to an updated cache configuration, it includes 44 lanes of PCIe 3.0, an increase of 4 lanes over Broadwell-E, it has quad-channel DDR4 memory up to 2666 MHz and it has a 140 watt TDP. The new LGA2066 socket will be utilized. Pricing for this CPU is set at $999, which is interesting for a couple of reasons. First, it is $700 less than the starting MSRP of the 10c/20t Core i7-6950X from one year ago; obviously a big plus. However, there is quite a ways UP the stack, with the 18c/36t Core i9-7980XE coming in at a cool $1999.

  Core i9-7900X Core i7-6950X Core i7-7700K
Architecture Skylake-X Broadwell-E Kaby Lake
Process Tech 14nm+ 14nm+ 14nm+
Cores/Threads 10/20 10/20 4/8
Base Clock 3.3 GHz 3.0 GHz 4.2 GHz
Turbo Boost 2.0 4.3 GHz 3.5 GHz 4.5 GHz
Turbo Boost Max 3.0 4.5 GHz 4.0 GHz N/A
Cache 13.75MB 25MB 8MB
Memory Support DDR4-2666
Quad Channel
DDR4-2400
Quad Channel
DDR4-2400
Dual Channel
PCIe Lanes 44 40 16
TDP 140 watts 140 watts 91 watts
Socket 2066 2011 1151
Price (Launch) $999 $1700 $339

The next CPU down the stack is compelling as well. The Core i7-7820X is the new 8-core / 16-thread HEDT option from Intel, with similar clock speeds to the 10-core above it (save the higher base clock). It has 11MB of L3 cache, 28-lanes of PCI Express (4 higher than Broadwell-E) but has a $599 price tag. Compared to the 8-core 6900K, that is ~$400 lower, while the new Skylake-X part iteration includes a 700 MHz clock speed advantage. That’s huge, and is a direct attack on the AMD Ryzen 7 1800X, which sells for $499 today and cut Intel off at the knees this March. In fact, the base clock of the Core i7-7820X is only 100 MHz lower than the maximum Turbo Boost clock of the Core i7-6900K!

intel1.jpg

It is worth noting the performance gap between the 7820X and the 7900X. That $400 gap seems huge and out of place when compared to the deltas in the rest of the stack that never exceed $300 (and that is at the top two slots). Intel is clearly concerned about the Ryzen 7 1800X and making sure it has options to compete at that point (and below) but feels less threatened by the upcoming Threadripper CPUs. Pricing out the 10+ core CPUs today, without knowing what AMD is going to do for that, is a risk and could put Intel in the same position as it was in with the Ryzen 7 release.

Continue reading our review of the Intel Core i9-7900X Processor!

Author:
Manufacturer: AMD

We are up to two...

UPDATE (5/31/2017): Crystal Dynamics was able to get back to us with a couple of points on the changes that were made with this patch to affect the performance of AMD Ryzen processors.

  1. Rise of the Tomb Raider splits rendering tasks to run on different threads. By tuning the size of those tasks – breaking some up, allowing multicore CPUs to contribute in more cases, and combining some others, to reduce overheads in the scheduler – the game can more efficiently exploit extra threads on the host CPU.
     
  2. An optimization was identified in texture management that improves the combination of AMD CPU and NVIDIA GPU.  Overhead was reduced by packing texture descriptor uploads into larger chunks.

There you have it, a bit more detail on the software changes made to help adapt the game engine to AMD's Ryzen architecture. Not only that, but it does confirm our information that there was slightly MORE to address in the Ryzen+GeForce combinations.

END UPDATE

Despite a couple of growing pains out of the gate, the Ryzen processor launch appears to have been a success for AMD. Both the Ryzen 7 and the Ryzen 5 releases proved to be very competitive with Intel’s dominant CPUs in the market and took significant leads in areas of massive multi-threading and performance per dollar. An area that AMD has struggled in though has been 1080p gaming – performance in those instances on both Ryzen 7 and 5 processors fell behind comparable Intel parts by (sometimes) significant margins.

Our team continues to watch the story to see how AMD and game developers work through the issue. Most recently I posted a look at the memory latency differences between Ryzen and Intel Core processors. As it turns out, the memory latency differences are a significant part of the initial problem for AMD:

Because of this, I think it is fair to claim that some, if not most, of the 1080p gaming performance deficits we have seen with AMD Ryzen processors are a result of this particular memory system intricacy. You can combine memory latency with the thread-to-thread communication issue we discussed previously into one overall system level complication: the Zen memory system behaves differently than anything we have seen prior and it currently suffers in a couple of specific areas because of it.

In that story I detailed our coverage of the Ryzen processor and its gaming performance succinctly:

Our team has done quite a bit of research and testing on this topic. This included a detailed look at the first asserted reason for the performance gap, the Windows 10 scheduler. Our summary there was that the scheduler was working as expected and that minimal difference was seen when moving between different power modes. We also talked directly with AMD to find out its then current stance on the results, backing up our claims on the scheduler and presented a better outlook for gaming going forward. When AMD wanted to test a new custom Windows 10 power profile to help improve performance in some cases, we took part in that too. In late March we saw the first gaming performance update occur courtesy of Ashes of the Singularity: Escalation where an engine update to utilize more threads resulted in as much as 31% average frame increase.

Quick on the heels of the Ryzen 7 release, AMD worked with the developer Oxide on the Ashes of the Singularity: Escalation engine. Through tweaks and optimizations, the game was able to showcase as much as a 30% increase in average frame rate on the integrated benchmark. While this was only a single use case, it does prove that through work with the developers, AMD has the ability to improve the 1080p gaming positioning of Ryzen against Intel.

rotr-screen4-small.jpg

Fast forward to today and I was surprised to find a new patch for Rise of the Tomb Raider, a game that was actually one of the worst case scenarios for AMD with Ryzen. (Patch #12, v1.0.770.1) The patch notes mention the following:

The following changes are included in this patch

- Fix certain DX12 crashes reported by users on the forums.

- Improve DX12 performance across a variety of hardware, in CPU bound situations. Especially performance on AMD Ryzen CPUs can be significantly improved.

While we expect this patch to be an improvement for everyone, if you do have trouble with this patch and prefer to stay on the old version we made a Beta available on Steam, build 767.2, which can be used to switch back to the previous version.

We will keep monitoring for feedback and will release further patches as it seems required. We always welcome your feedback!

Obviously the data point that stood out for me was the improved DX12 performance “in CPU bound situations. Especially on AMD Ryzen CPUs…”

Remember how the situation appeared in April?

rotr.png

The Ryzen 7 1800X was 24% slower than the Intel Core i7-7700K – a dramatic difference for a processor that should only have been ~8-10% slower in single threaded workloads.

How does this new patch to RoTR affect performance? We tested it on the same Ryzen 7 1800X benchmarks platform from previous testing including the ASUS Crosshair VI Hero motherboard, 16GB DDR4-2400 memory and GeForce GTX 1080 Founders Edition using the 378.78 driver. All testing was done under the DX12 code path.

tr-1.png

tr-2.png

The Ryzen 7 1800X score jumps from 107 FPS to 126.44 FPS, an increase of 17%! That is a significant boost in performance at 1080p while still running at the Very High image quality preset, indicating that the developer (and likely AMD) were able to find substantial inefficiencies in the engine. For comparison, the 8-core / 16-thread Intel Core i7-6900K only sees a 2.4% increase from this new game revision. This tells us that the changes to the game were specific to Ryzen processors and their design, but that no performance was redacted from the Intel platforms.

Continue reading our look at the new Rise of the Tomb Raider patch for Ryzen!

Author:
Manufacturer: Intel

An abundance of new processors

During its press conference at Computex 2017, Intel has officially announced the upcoming release of an entire new family of HEDT (high-end desktop) processors along with a new chipset and platform to power it. Though it has only been a year since Intel launched the Core i7-6950X, a Broadwell-E processor with 10-cores and 20-threads, it feels like it has been much longer than that. At the time Intel was accused of “sitting” on the market – offering only slight performance upgrades and raising prices on the segment with a flagship CPU cost of $1700. With can only be described as scathing press circuit, coupled with a revived and aggressive competitor in AMD and its Ryzen product line, Intel and its executive teams have decided it’s time to take enthusiasts and high end prosumer markets serious, once again.

slides-3.jpg

Though the company doesn’t want to admit to anything publicly, it seems obvious that Intel feels threatened by the release of the Ryzen 7 product line. The Ryzen 7 1800X was launched at $499 and offered 8 cores and 16 threads of processing, competing well in most tests against the likes of the Intel Core i7-6900X that sold for over $1000. Adding to the pressure was the announcement at AMD’s Financial Analyst Day that a new brand of processors called Threadripper would be coming this summer, offering up to 16 cores and 32 threads of processing for that same high-end consumer market. Even without pricing, clocks or availability timeframes, it was clear that AMD was going to come after this HEDT market with a brand shift of its EPYC server processors, just like Intel does with Xeon.

The New Processors

Normally I would jump into the new platform, technologies and features added to the processors, or something like that before giving you the goods on the CPU specifications, but that’s not the mood we are in. Instead, let’s start with the table of nine (9!!) new products and work backwards.

  Core i9-7980XE Core i9-7960X Core i9-7940X Core i9-7920X Core i9-7900X Core i7-7820X Core i7-7800X Core i7-7740X Core i5-7640X
Architecture Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Skylake-X Kaby Lake-X Kaby Lake-X
Process Tech 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+ 14nm+
Cores/Threads 18/36 16/32 14/28 12/24 10/20 8/16 6/12 4/8 4/4
Base Clock ? ? ? ? 3.3 GHz 3.6 GHz 3.5 GHz 4.3 GHz 4.0 GHz
Turbo Boost 2.0 ? ? ? ? 4.3 GHz 4.3 GHz 4.0 GHz 4.5 GHz 4.2 GHz
Turbo Boost Max 3.0 ? ? ? ? 4.5 GHz 4.5 GHz N/A N/A N/A
Cache 16.5MB (?) 16.5MB (?) 16.5MB (?) 16.5MB (?) 13.75MB 11MB 8.25MB 8MB 6MB
Memory Support ? ? ? ? DDR4-2666
Quad Channel
DDR4-2666
Quad Channel
DDR4-2666
Quad Channel
DDR4-2666
Dual Channel
DDR4-2666 Dual Channel
PCIe Lanes ? ? ? ? 44 28 28 16 16
TDP 165 watts (?) 165 watts (?) 165 watts (?) 165 watts (?) 140 watts 140 watts 140 watts 112 watts 112 watts
Socket 2066 2066 2066 2066 2066 2066 2066 2066 2066
Price $1999 $1699 $1399 $1199 $999 $599 $389 $339 $242

There is a lot to take in here. The most interesting points are that Intel plans to one-up AMD Threadripper by offering an 18-core processor but it also wants to change the perception of the X299-class platform by offering lower price, lower core count CPUs like the quad-core, non-HyperThreaded Core i5-7640X. We also see the first ever branding of Core i9.

Intel only provided detailed specifications up to the Core i9-7900X, a 10-core / 20-thread processor with a base clock of 3.3 GHz and a Turbo peak of 4.5 GHz using the new Turbo Boost Max Technology 3.0. It sports 13.75MB of cache thanks to an updated cache configuration, includes 44 lanes of PCIe 3.0, an increase of 4 lanes over Broadwell-E, quad-channel DDR4 memory up to 2666 MHz and a 140 watt TDP. The new LGA2066 socket will be utilized. Pricing for this CPU is set at $999, which is interesting for a couple of reasons. First, it is $700 less than the starting MSRP of the 10c/20t Core i7-6950X from one year ago; obviously a big plus. However, there is quite a ways UP the stack, with the 18c/36t Core i9-7980XE coming in at a cool $1999.

intel1.jpg

The next CPU down the stack is compelling as well. The Core i7-7820X is the new 8-core / 16-thread HEDT option from Intel, with similar clock speeds to the 10-core above it, save the higher base clock. It has 11MB of L3 cache, 28-lanes of PCI Express (4 higher than Broadwell-E) but has a $599 price tag. Compared to the 8-core 6900K, that is ~$400 lower, while the new Skylake-X part iteration includes a 700 MHz clock speed advantage. That’s huge, and is a direct attack on the AMD Ryzen 7 1800X that sells for $499 today and cut Intel off at the knees this March. In fact, the base clock of the Core i7-7820X is only 100 MHz lower than the maximum Turbo Boost clock of the Core i7-6900K!

Continue reading about the Intel Core i9 series announcement!

Author:
Manufacturer: ARM

ARM Refreshes All the Things

This past April ARM invited us to visit Cambridge, England so they could discuss with us their plans for the next year.  Quite a bit has changed for the company since our last ARM Tech Day in 2016.  They were acquired by SoftBank, but continue to essentially operate as their own company.  They now have access to more funds, are less risk averse, and have a greater ability to expand in the ever growing mobile and IOT marketplaces.

dynamiq_01.png

The ARM of today certainly is quite different than what we had known 10 years ago when we saw their technology used in the first iPhone.  The company back then had good technology, but a relatively small head count.  They kept pace with the industry, but were not nearly as aggressive as other chip companies in some areas.  Through the past 10 years they have grown not only in numbers, but in technologies that they have constantly expanded on.  The company became more PR savvy and communicated more effectively with the press and in the end their primary users.  Where once ARM would announce new products and not expect to see shipping products upwards of 3 years away, we are now seeing the company be much more aggressive with their designs and getting them out to their partners so that production ends up happening in months as compared to years.

Several days of meetings and presentations left us a bit overwhelmed by what ARM is bringing to market towards the end of 2017 and most likely beginning of 2018.  On the surface it appears that ARM has only done a refresh of the CPU and GPU products, but once we start looking at these products in the greater scheme and how they interact with DynamIQ we see that ARM has changed the mobile computing landscape dramatically.  This new computing concept allows greater performance, flexibility, and efficiency in designs.  Partners will have far more control over these licensed products to create more value and differentiation as compared to years past.

dynamiq_02.png

We have previously covered DynamIQ at PCPer this past March.  ARM wanted to seed that concept before they jumped into more discussions on their latest CPUs and GPUs.  Previous Cortex products cannot be used with DynamIQ.  To leverage that technology we must have new CPU designs.  In this article we are covering the Cortex-A55 and Cortex-A75.  These two new CPUs on the surface look more like a refresh, but when we dig in we see that some massive changes have been wrought throughout.  ARM has taken the concepts of the previous A53 and A73 and expanded upon them fairly dramatically, not only to work with DynamIQ but also by removing significant bottlenecks that have impeded theoretical performance.

Continue reading our overview of the new family of ARM CPUs and GPU!

Author:
Subject: Processors
Manufacturer: Various

Application Profiling Tells the Story

It should come as no surprise to anyone that has been paying attention the last two months that the latest AMD Ryzen processors and architecture are getting a lot of attention. Ryzen 7 launched with a $499 part that bested the Intel $1000 CPU at heavily threaded applications and Ryzen 5 launched with great value as well, positioning a 6-core/12-thread CPU against quad-core parts from the competition. But part of the story that permeated through both the Ryzen 7 and the Ryzen 5 processor launches was the situation surrounding gaming performance, in particular 1080p gaming, and the surprising delta  that we see in some games.

Our team has done quite a bit of research and testing on this topic. This included a detailed look at the first asserted reason for the performance gap, the Windows 10 scheduler. Our summary there was that the scheduler was working as expected and that minimal difference was seen when moving between different power modes. We also talked directly with AMD to find out its then current stance on the results, backing up our claims on the scheduler and presented a better outlook for gaming going forward. When AMD wanted to test a new custom Windows 10 power profile to help improve performance in some cases, we took part in that too. In late March we saw the first gaming performance update occur courtesy of Ashes of the Singularity: Escalation where an engine update to utilize more threads resulted in as much as 31% average frame increase.

ping-amd.png

As a part of that dissection of the Windows 10 scheduler story, we also discovered interesting data about the CCX construction and how the two modules on the 1800X communicated. The result was significantly longer thread to thread latencies than we had seen in any platform before and it was because of the fabric implementation that AMD integrated with the Zen architecture.

This has led me down another hole recently, wondering if we could further compartmentalize the gaming performance of the Ryzen processors using memory latency. As I showed in my Ryzen 5 review, memory frequency and throughput directly correlates to gaming performance improvements, in the order of 14% in some cases. But what about looking solely at memory latency alone?

Continue reading our analysis of memory latency, 1080p gaming, and how it impacts Ryzen!!

Manufacturer: EKWB

Introduction and Technical Specifications

Introduction

02-block-with-mount-profile.jpg

Courtesy of EKWB

EK's Supremacy line of CPU waterblocks are well known for their performance and style. Their latest version in this block line, the Supremacy MX, advances their design in the hopes of getting more optimized performance out of a less costly version of their award winning block series. The base Supremacy MX CPU waterblock is a copper and plexi construction using the same jet-impingement and micro-channel design as that used in their previous block versions. The block comes fully assembled from the factory with a single CPU mounting bracket type (in this case, the Intel version). Note that additional CPU mounting kits are available for purchase. With an MSRP of $54.99, the Supremacy MX waterblock offers a compelling purchase in light of its performance potential.

03-block-closeup.jpg

Courtesy of EKWB

04-block-flyapart.jpg

06-block-mounted-lit.jpg

Courtesy of EKWB

The block is assembled with hex-head screws going through the copper base plate with rubber grommets ensuring the integrity of the block internals. The top aluminum cover plate is held to the plexi top using short hex-head screws that thread directly into the plexi top plate. The center inlet feeds the micro-channels embedded in the copper base plate through the jet-impingement assembly. The mounting bracket sits in between the top plexi plate and the copper base plate, making any an interesting upgrade if you want to switch out the CPU mount plate to use the block on a different CPU family (like going from Intel to AMD Ryzen for example). The aluminum top plate gives the block a sleek appearance and acts to redirect illumination from the side mounted LEDs (if you choose to use LEDs with the block that is).

Continue reading our review of the EK Supremacy MX CPU waterblock!

Author:
Subject: Processors
Manufacturer: AMD

The real battle begins

When AMD launched the Ryzen 7 processors last month to a substantial amount of fanfare and pent up excitement, we already knew that the Ryzen 5 launch would be following close behind. While the Ryzen 7 lineup was meant to compete with the Intel Core i7 Kaby Lake and Broadwell-E products, with varying levels of success, the Ryzen 5 parts are priced to go head to head with Intel's Core i5 product line. 

AMD already told us the details of the new product line including clock speeds, core counts and pricing, so there is little more to talk about other than the performance and capabilities we found from our testing of the new Ryzen 5 parts. Starting with the Ryzen 5 1600X, with 6 cores, 12 threads and a $249 price point, and going down to the Ryzen 5 1400 with 4 cores, 8 threads and a $169 price point, this is easily AMD's most aggressive move to date. The Ryzen 7 1800X at $499 was meant to choke off purchases of Intel's $1000+ parts; Ryzen 5 is attempting to offer significant value and advantage for users on a budget.

Today we have the Ryzen 5 1600X and Ryzen 5 1500X in our hands. The 1600X is a 6C/12T processor that will have a 50% core count advantage over the Core i5-7600K it is priced against but a 3x advantage in thread count because of Intel's disabling of HyperThreading on Core i5 desktop processors. The Ryzen 5 1500X has the same number of cores as the Core i5-7500 it will be pitted against, but 2x the thread count. 

01.jpg

How does this fare for AMD? Will budget consumers finally find a solution from the company that has no caveats?

Continue reading our review of the AMD Ryzen 5 1600X and 1500X processors!!

Author:
Subject: Processors
Manufacturer: AMD

Tweaks for days

It seems like it’s been months since AMD launched Ryzen, its first new processor architecture in about a decade, when in fact we are only four weeks removed. One of the few concerns about the Ryzen processors centered on its performance in some gaming performance results, particularly in common resolutions like 1080p. While I was far from the only person to notice these concerns, our gaming tests clearly showed a gap between the Ryzen 7 1800X and the Intel Core i7-7700K and 6900K processors in Civilization 6, Hitman and Rise of the Tomb Raider.

hitman.png

A graph from our Ryzen launch coverage...

We had been working with AMD for a couple of weeks on the Ryzen launch and fed back our results with questions in the week before launch. On March 2nd, AMD’s CVP of Marketing John Taylor gave us a prepared statement that acknowledged the issue but promised changes come in form for game engine updates. These software updates would need to be implemented by the game developers themselves in order to take advantage of the unique and more complex core designs of the Zen architecture. We had quotes from the developers of Ashes of the Singularity as well as the Total War series to back it up.

And while statements promising change are nice, it really takes some proof to get the often skeptical tech media and tech enthusiasts to believe that change can actually happen. Today AMD is showing its first result.

The result of 400 developer hours of work, the Nitrous Engine powering Ashes of the Singularity received an update today to version 26118 that integrates updates to threading to better balance the performance across Ryzen 7’s 8 cores and 16 threads. I was able to do some early testing on the new revision, as well as with the previous retail shipping version (25624) to see what kind of improvements the patch brings with it.

Stardock / Oxide CEO Brad Wardell had this to say in a press release:

“I’ve always been vocal about taking advantage of every ounce of performance the PC has to offer. That’s why I’m a strong proponent of DirectX 12 and Vulkan® because of the way these APIs allow us to access multiple CPU cores, and that’s why the AMD Ryzen processor has so much potential,” said Stardock and Oxide CEO Brad Wardell. “As good as AMD Ryzen is right now – and it’s remarkably fast – we’ve already seen that we can tweak games like Ashes of the Singularity to take even more advantage of its impressive core count and processing power. AMD Ryzen brings resources to the table that will change what people will come to expect from a PC gaming experience.”

Our testing setup is in line with our previous CPU performance stories.

Test System Setup
CPU AMD Ryzen 7 1800X
Intel Core i7-6900K
Motherboard ASUS Crosshair VI Hero (Ryzen)
ASUS X99-Deluxe II (Broadwell-E)
Memory 16GB DDR4-2400
Storage Corsair Force GS 240 SSD
Sound Card On-board
Graphics Card NVIDIA GeForce GTX 1080 8GB
Graphics Drivers NVIDIA 378.49
Power Supply Corsair HX1000
Operating System Windows 10 Pro x64

I was using the latest BIOS for our ASUS Crosshair VI Hero motherboard (1002) and upgraded to some Geil RGB (!!) memory capable of running at 3200 MHz on this board with a single BIOS setting adjustment. All of my tests were done at 1080p in order to return to the pain point that AMD was dealing with on launch day.

Let’s see the results.

ashes-1.png

ashes-2.png

These are substantial performance improvements with the new engine code! At both 2400 MHz and 3200 MHz memory speeds, and at both High and Extreme presets in the game (all running in DX12 for what that’s worth), the gaming performance on the GPU-centric is improved. At the High preset (which is the setting that AMD used in its performance data for the press release), we see a 31% jump in performance when running at the higher memory speed and a 22% improvement with the lower speed memory. Even when running at the more GPU-bottlenecked state of the Extreme preset, that performance improvement for the Ryzen processors with the latest Ashes patch is 17-20%!

DSC02636.jpg

It’s also important to note that Intel performance is unaffected – either for the better or worse. Whatever work Oxide did to improve the engine for AMD’s Ryzen processors had NO impact on the Core processors, which is interesting to say the least. The cynic in me would believe there is little chance that any agnostic changes to code would raise Intel’s multi-core performance at least a little bit.

So what exactly is happening to the engine with v26118? I haven’t had a chance to have an in-depth conversation with anyone at AMD or Oxide yet on the subject, but at a high level, I was told that this is what happens when instructions and sequences are analyzed for an architecture specifically. “For basically 5 years”, I was told, Oxide and other developers have dedicated their time to “instruction traces and analysis to maximize Intel performance” which helps to eliminate poor instruction setup. After spending some time with Ryzen and the necessary debug tools (and some AMD engineers), they were able to improve performance on Ryzen without adversely affecting Intel parts.

ping-amd.png

Core to core latency testing on Ryzen 7 1800X

I am hoping to get more specific detail in the coming days, but it would seem very likely that Oxide was able to properly handle the more complex core to core communication systems on Ryzen and its CCX implementation. We demonstrated early this month how thread to thread communication across core complexes causes substantially latency penalties, and that a developer that intelligently manages threads that have dependencies on the core complex can improve overall performance. I would expect this is at least part of the solution Oxide was able to integrate (and would also explain why Intel parts are unaffected).

What is important now is that AMD takes this momentum with Ashes of the Singularity and actually does something with it. Many of you will recognize Ashes as the flagship title for Mantle when AMD made that move to change the programming habits and models for developers, and though Mantle would eventually become Vulkan and drive DX12 development, it did not foretell an overall shift as it hoped to. Can AMD and its developer relations team continue to make the case that spending time and money (which is what 400 developer hours equates to) to make specific performance enhancements for Ryzen processors is in the best interest of everyone? We’ll soon find out.

Author:
Subject: Processors, Mobile
Manufacturer: Qualcomm

A new start

Qualcomm is finally ready to show the world how the Snapdragon 835 Mobile Platform performs. After months of teases and previews, including a the reveal that it was the first processor built on Samsung’s 10nm process technology and a mostly in-depth look at the architectural changes to the CPU and GPU portions of the SoC, the company let a handful of media get some hands-on time with development reference platform and run some numbers.

To frame the discussion as best I can, I am going to include some sections from my technology overview. This should give some idea of what to expect from Snapdragon 835 and what areas Qualcomm sees providing the widest variation from previous SD 820/821 product.

Qualcomm frames the story around the Snapdragon 835 processor with what they call the “five pillars” – five different aspects of mobile processor design that they have addressed with updates and technologies. Qualcomm lists them as battery life (efficiency), immersion (performance), capture, connectivity, and security.

slides1-6.jpg

Starting where they start, on battery life and efficiency, the SD 835 has a unique focus that might surprise many. Rather than talking up the improvements in performance of the new processor cores, or the power of the new Adreno GPU, Qualcomm is firmly planted on looking at Snapdragon through the lens of battery life. Snapdragon 835 uses half of the power of Snapdragon 801.

slides2-11.jpg

Since we already knew that the Snapdragon 835 was going to be built on the 10nm process from Samsung, the first such high performance part to do so, I was surprised to learn that Qualcomm doesn’t attribute much of the power efficiency improvements to the move from 14nm to 10nm. It makes sense – most in the industry see this transition as modest in comparison to what we’ll see at 7nm. Unlike the move from 28nm to 14/16nm for discrete GPUs, where the process technology was a huge reason for the dramatic power drop we saw, the Snapdragon 835 changes come from a combination of advancements in the power management system and offloading of work from the primary CPU cores to other processors like the GPU and DSP. The more a workload takes advantage of heterogeneous computing systems, the more it benefits from Qualcomm technology as opposed to process technology.

slides2-22.jpg

Continue reading our preview of Qualcomm Snapdragon 835 performance!

Author:
Subject: Processors
Manufacturer: AMD

Here Comes the Midrange!

Today AMD is announcing the upcoming Ryzen 5 CPUs.  A little bit was known about them from several weeks ago when AMD talked about their upcoming 6 core processors, but official specifications were lacking.  Today we get to see what Ryzen 5 is mostly about.

ryzen5_01.png

There are four initial SKUs that AMD is talking about this evening.  These encompass quad core and six core products.  There are two “enthusiast” level SKUs with the X connotation while the other two are aimed at a less edgy crowd.

The two six core CPUs are the 1600 and 1600X.  The X version features the higher extended frequency range when combined with performance cooling.  That unit is clocked at a base 3.6 GHz and achieves a boost of 4 GHz.  This compares well to the top end R7 1800X, but it is short 2 cores and four threads.  The price of the R5 1600X is a very reasonable $249.  The 1600 does not feature the extended range, but it does come in at a 3.2 GHz base and 3.6 GHz boost.  The R5 1600 has a MSRP of $219.

ryzen5_04.png

When we get to the four core, eight thread units we see much the same stratification.  The top end 1500X comes in at $189 and features a base clock of 3.5 GHz and a boost of 3.7 GHz.  What is interesting about this model is that the XFR is raised by 100 MHz vs. other XFR CPUs.  So instead of an extra 100 MHz boost when high end cooling is present we can expect to see 200 MHz.  In theory this could run at 3.9 GHz in the extended state.  The lowest priced R5 is the 1400 which comes in at a very modest $169.  This features a 3.2 GHz base clock and a 3.4 GHz boost.

The 1400, 1500, and 1600 CPUs come with Wraith cooling solutions.  The 1600X comes bare as it is assumed that users want to use something a bit more robust.  The R5 1400 comes with the lower end Wraith Stealth cooler while the R5 1500X and R5 1600 come with the bigger Wraith Spire.  The bottom 3 SKUs are all rated at 65 watts TDP.  The 1600X comes in at the higher 95 watt rating.  Each of the CPUs are unlocked for overclocking.

ryzen5_03.png

These chips will provide a more fleshed out pricing structure for the Ryzen processors and provide users and enthusiasts with lower cost options for those wanting to invest in AMD again.  These chips all run on the new AM4 platform which are pretty strong in terms of features and I/O performance.

ryzen5_02.png

AMD is not shipping these parts today, but rather announcing them.  Review samples are not in hand yet and AMD expects world-wide availability by April 11.  This is likely a very necessary step for AMD as current AM4 motherboard availability is not at the level we were expecting to see.  We also are seeing some pretty quick firmware updates from motherboard partners to address issues with these first AM4 boards.  By April 11 I would expect to see most of the issues solved and a healthy supply of motherboards on the shelves to handle the influx of consumers waiting to buy these more midrange priced CPUs from AMD.

What they did not cover or answer would be how the four core products would be presented.  Would each be a single CCX and only 8 MB of L3 cace, or would AMD disable two cores in each CCX and present 16 MB of L3?  We currently do not have the answer to this.  Considering the latency between accessing different CCX units we can surely hope they only keep one CCX active.

ryzen5_05.png

Ryzen has certainly been a success for AMD and I have no doubt that their quarter will be pretty healthy with the estimated sales of around 1 million Ryzen CPUs since launch.  Announcing these new chips will give the mainstream and budget enthusiasts something to look forward to and plan their purchases around.  AMD is not announcing the Ryzen 3 products at this time.

Update: AMD got back to me this morning about a question I asked them about the makeup of cores, CCX units, and L3 cache.  Here is their response.

1600X: 3+3 with 16MB L3 cache. 1600: 3+3 with 16MB L3 cache. 1500X: 2+2 with 16MB L3 cache. 1400: 2+2 with 8MB L3 cache. As with Ryzen 7, each core still has 512KB local L2 cache.

Manufacturer: RockIt Cool

Introduction

Introduction

With the introduction of the Intel Kaby Lake processors and Intel Z270 chipset, unprecedented overclocking became the norm. The new processors easily hit a core speed of 5.0GHz with little more than CPU core voltage tweaking. This overclocking performance increase came with a price tag. The Kaby Lake processor runs significantly hotter than previous generation processors, a seeming reversal in temperature trends from previous generation Intel CPUs. At stock settings, the individual cores in the CPU were recording in testing at hitting up to 65C - and that's with a high performance water loop cooling the processor. Per reports from various enthusiasts sites, Intel used inferior TIM (thermal interface material) in between the CPU die and underside of the CPU heat spreader, leading to increased temperatures when compared with previous CPU generations (in particular Skylake). This temperature increase did not affect overclocking much since the CPU will hit 5.0GHz speed easily, but does impact the means necessary to hit those performance levels.

Like with the previous generation Haswell CPUs, a few of the more adventurous enthusiasts used known methods in an attempt to address the heat concerns of the Kaby Lake processor be delidding the processor. Unlike in the initial days of the Haswell processor, the delidding process is much more stream-lined with the availability of delidding kits from several vendors. The delidding process still involves physically removing the heat spreader from the CPU, and exposing the CPU die. However, instead of cooling the die directly, the "safer" approach is to clean the die and underside of the heat spreader, apply new TIM (thermal interface material), and re-affix the heat spreader to the CPU. Going this route instead of direct-die cooling is considered safer because no additional or exotic support mechanisms are needed to keep the CPU cooler from crushing your precious die. However, calling it safe is a bit of an over-statement, you are physically separating the heat spreader from the CPU surface and voiding your CPU warranty at the same time. Although if that was a concern, you probably wouldn't be reading this article in the first place.

Continue reading our Kaby Lake Relidding article!

Subject: Processors
Manufacturer: AMD

** UPDATE 3/13 5 PM **

AMD has posted a follow-up statement that officially clears up much of the conjecture this article was attempting to clarify. Relevant points from their post that relate to this article as well as many of the requests for additional testing we have seen since its posting (emphasis mine):

  • "We have investigated reports alleging incorrect thread scheduling on the AMD Ryzen™ processor. Based on our findings, AMD believes that the Windows® 10 thread scheduler is operating properly for “Zen,” and we do not presently believe there is an issue with the scheduler adversely utilizing the logical and physical configurations of the architecture."

  • "Finally, we have reviewed the limited available evidence concerning performance deltas between Windows® 7 and Windows® 10 on the AMD Ryzen™ CPU. We do not believe there is an issue with scheduling differences between the two versions of Windows.  Any differences in performance can be more likely attributed to software architecture differences between these OSes."

So there you have it, straight from the horse's mouth. AMD does not believe the problem lies within the Windows thread scheduler. SMT performance in gaming workloads was also addressed:

  • "Finally, we have investigated reports of instances where SMT is producing reduced performance in a handful of games. Based on our characterization of game workloads, it is our expectation that gaming applications should generally see a neutral/positive benefit from SMT. We see this neutral/positive behavior in a wide range of titles, including: Arma® 3, Battlefield™ 1, Mafia™ III, Watch Dogs™ 2, Sid Meier’s Civilization® VI, For Honor™, Hitman™, Mirror’s Edge™ Catalyst and The Division™. Independent 3rd-party analyses have corroborated these findings.

    For the remaining outliers, AMD again sees multiple opportunities within the codebases of specific applications to improve how this software addresses the “Zen” architecture. We have already identified some simple changes that can improve a game’s understanding of the "Zen" core/cache topology, and we intend to provide a status update to the community when they are ready."

We are still digging into the observed differences of toggling SMT compared with disabling the second CCX, but it is good to see AMD issue a clarifying statement here for all of those out there observing and reporting on SMT-related performance deltas.

** END UPDATE **

Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posted that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the logical vs. physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout

Initial reviews of AMD’s Ryzen CPU revealed a few inefficiencies in some situations particularly in gaming workloads running at the more common resolutions like 1080p, where the CPU comprises more of a bottleneck when coupled with modern GPUs. Lots of folks have theorized about what could possibly be causing these issues, and most recent attention appears to have been directed at the Windows 10 scheduler and its supposed inability to properly place threads on the Ryzen cores for the most efficient processing. 

I typically have Task Manager open while running storage tests (they are boring to watch otherwise), and I naturally had it open during Ryzen platform storage testing. I’m accustomed to how the IO workers are distributed across reported threads, and in the case of SMT capable CPUs, distributed across cores. There is a clear difference when viewing our custom storage workloads with SMT on vs. off, and it was dead obvious to me that core loading was working as expected while I was testing Ryzen. I went back and pulled the actual thread/core loading data from my testing results to confirm:

SMT on usage.png

The Windows scheduler has a habit of bouncing processes across available processor threads. This naturally happens as other processes share time with a particular core, with the heavier process not necessarily switching back to the same core. As you can see above, the single IO handler thread was spread across the first four cores during its run, but the Windows scheduler was always hitting just one of the two available SMT threads on any single core at one time.

My testing for Ryan’s Ryzen review consisted of only single threaded workloads, but we can make things a bit clearer by loading down half of the CPU while toggling SMT off. We do this by increasing the worker count (4) to be half of the available threads on the Ryzen processor, which is 8 with SMT disabled in the motherboard BIOS.

smtoff4workers.png

SMT OFF, 8 cores, 4 workers

With SMT off, the scheduler is clearly not giving priority to any particular core and the work is spread throughout the physical cores in a fairly even fashion.

Now let’s try with SMT turned back on and doubling the number of IO workers to 8 to keep the CPU half loaded:

smton8workers.png

SMT ON, 16 (logical) cores, 8 workers

With SMT on, we see a very different result. The scheduler is clearly loading only one thread per core. This could only be possible if Windows was aware of the 2-way SMT (two threads per core) configuration of the Ryzen processor. Do note that sometimes the workload will toggle around every few seconds, but the total loading on each physical core will still remain at ~%50. I chose a workload that saturated its thread just enough for Windows to not shift it around as it ran, making the above result even clearer.

Synthetic Testing Procedure

While the storage testing methods above provide a real-world example of the Windows 10 scheduler working as expected, we do have another workload that can help demonstrate core balancing with Intel Core and AMD Ryzen processors. A quick and simple custom-built C++ application can be used to generate generic worker threads and monitor for core collisions and resolutions.

This test app has a very straight forward workflow. Every few seconds it generates a new thread, capping at N/2 threads total, where N is equal to the reported number of logical cores. If the OS scheduler is working as expected, it should load 8 threads across 8 physical cores, though the division between the specific logical core per physical core will be based on very minute parameters and conditions going on in the OS background.

By monitoring the APIC_ID through the CPUID instruction, the first application thread monitors all threads and detects and reports on collisions - when a thread from our app is running on the same core as another thread from our app. That thread also reports when those collisions have been cleared. In an ideal and expected environment where Windows 10 knows the boundaries of physical and logical cores, you should never see more than one thread of a core loaded at the same time.

app01.png

Click to Enlarge

This screenshot shows our app working on the left and the Windows Task Manager on the right with logical cores labeled. While it may look like all logical cores are being utilized at the same time, in fact they are not. At any given point, only LCore 0 or LCore 1 are actively processing a thread. Need proof? Check out the modified view of the task manager where I copy the graph of LCore 1/5/9/13 over the graph of LCore 0/4/8/12 with inverted colors to aid viewability.

app02-2.png

If you look closely, by overlapping the graphs in this way, you can see that the threads migrate from LCore 0 to LCore 1, LCore 4 to LCore 5, and so on. The graphs intersect and fill in to consume ~100% of the physical core. This pattern is repeated for the other 8 logical cores on the right two columns as well. 

Running the same application on a Core i7-5960X Haswell-E 8-core processor shows a very similar behavior.

app03.png

Click to Enlarge

Each pair of logical cores shares a single thread and when thread transitions occur away from LCore N, they migrate perfectly to LCore N+1. It does appear that in this scenario the Intel system is showing a more stable threaded distribution than the Ryzen system. While that may in fact incur some performance advantage for the 5960X configuration, the penalty for intra-core thread migration is expected to be very minute.

The fact that Windows 10 is balancing the 8 thread load specifically between matching logical core pairs indicates that the operating system is perfectly aware of the processor topology and is selecting distinct cores first to complete the work.

Information from this custom application, along with the storage performance tool example above, clearly show that Windows 10 is attempting to balance work on Ryzen between cores in the same manner that we have experienced with Intel and its HyperThreaded processors for many years.

Continue reading our look at AMD Ryzen and Windows 10 scheduling!

Author:
Subject: Processors
Manufacturer: AMD

The right angle

While many in the media and enthusiast communities are still trying to fully grasp the importance and impact of the recent AMD Ryzen 7 processor release, I have been trying to complete my review of the 1700X and 1700 processors, in between testing the upcoming GeForce GTX 1080 Ti and preparing for more hardware to show up at the offices very soon. There is still much to learn and understand about the first new architecture from AMD in nearly a decade, including analysis of the memory hierarchy, power consumption, overclocking, gaming performance, etc.

During my Ryzen 7 1700 testing, I went through some overclocking evaluation and thought the results might be worth sharing earlier than later. This quick article is just a preview of what we are working on so don’t expect to find the answers to Ryzen power management here, only a recounting of how I was able to get stellar performance from the lowest priced Ryzen part on the market today.

The system specifications for this overclocking test were identical to our original Ryzen 7 processor review.

Test System Setup
CPU AMD Ryzen 7 1800X
AMD Ryzen 7 1700X
AMD Ryzen 7 1700
Intel Core i7-7700K
Intel Core i5-7600K
Intel Core i7-6700K
Intel Core i7-6950X
Intel Core i7-6900K
Intel Core i7-6800K
Motherboard ASUS Crosshair VI Hero (Ryzen)
ASUS Prime Z270-A (Kaby Lake, Skylake)
ASUS X99-Deluxe II (Broadwell-E)
Memory 16GB DDR4-2400
Storage Corsair Force GS 240 SSD
Sound Card On-board
Graphics Card NVIDIA GeForce GTX 1080 8GB
Graphics Drivers NVIDIA 378.49
Power Supply Corsair HX1000
Operating System Windows 10 Pro x64

Of note is that I am still utilizing the Noctua U12S cooler that AMD provided for our initial testing – all of the overclocking and temperature reporting in this story is air cooled.

DSC02643.jpg

First, let’s start with the motherboard. All of this testing was done on the ASUS Crosshair VI Hero with the latest 5704 BIOS installed. As I began to discover the different overclocking capabilities (BCLK adjustment, multipliers, voltage) I came across one of the ASUS presets. These presets offer pre-defined collections of settings that ASUS feels will offer simple overclocking capabilities. An option for higher BCLK existed but the one that caught my eye was straight forward – 4.0 GHz.

asusbios.jpg

With the Ryzen 1700 installed, I thought I would give it a shot. Keep in mind that this processor has a base clock of 3.0 GHz, a rated maximum boost clock of 3.7 GHz, and is the only 65-watt TDP variant of the three Ryzen 7 processors released last week. Because of that, I didn’t expect the overclocking capability for it to match what the 1700X and 1800X could offer. Based on previous processor experience, when a chip is binned at a lower power draw than the rest of a family it will often have properties that make it disadvantageous for running at HIGHER power. Based on my results here, that doesn’t seem to the case.

4.0.PNG

By simply enabling that option in the ASUS UEFI and rebooting, our Ryzen 1700 processor was running at 4.0 GHz on all cores! For this piece, I won’t be going into the drudge and debate on what settings ASUS changed to get to this setting or if the voltages are overly aggressive – the point is that it just works out of the box.

Continue reading our look at overclocking the new Ryzen 7 1700 processor!

Author:
Subject: Processors
Manufacturer: AMD

AMD Ryzen 7 Processor Specifications

It’s finally here and its finally time to talk about. The AMD Ryzen processor is being released onto the world and based on the buildup of excitement over the last week or so since pre-orders began, details on just how Ryzen performs relative to Intel’s mainstream and enthusiast processors are a hot commodity. While leaks have been surfacing for months and details seem to be streaming out from those not bound to the same restrictions we have been, I think you are going to find our analysis of the Ryzen 7 1800X processor to be quite interesting and maybe a little different as well.

Honestly, there isn’t much that has been left to the imagination about Ryzen, its chipsets, pricing, etc. with the slow trickle of information that AMD has been sending out since before CES in January. We know about the specifications, we know about the architecture, we know about the positioning; and while I will definitely recap most of that information here, the real focus is going to be on raw numbers. Benchmarks are what we are targeting with today’s story.

Let’s dive right in.

The Zen Architecture – Foundation for Ryzen

Actually, as it turns out, in typical Josh Walrath fashion, he wrote too much about the AMD Zen architecture to fit into this page. So, instead, you'll find his complete analysis of AMD's new baby right here: AMD Zen Architecture Overview: Focus on Ryzen

ccx.png

AMD Ryzen 7 Processor Specifications

Though we have already detailed the most important specifications for the new AMD Ryzen processors when the preorders went live, its worth touching on them again and reemphasizing the important ones.

  Ryzen 7 1800X Ryzen 7 1700X Ryzen 7 1700 Core i7-6900K Core i7-6800K Core i7-7700K Core i5-7600K Core i7-6700K
Architecture Zen Zen Zen Broadwell-E Broadwell-E Kaby Lake Kaby Lake Skylake
Process Tech 14nm 14nm 14nm 14nm 14nm 14nm+ 14nm+ 14nm
Cores/Threads 8/16 8/16 8/16 8/16 6/12 4/8 4/4 4/8
Base Clock 3.6 GHz 3.4 GHz 3.0 GHz 3.2 GHz 3.4 GHz 4.2 GHz 3.8 GHz 4.0 GHz
Turbo/Boost Clock 4.0 GHz 3.8  GHz 3.7 GHz 3.7 GHz 3.6 GHz 4.5 GHz 4.2 GHz 4.2 GHz
Cache 20MB 20MB 20MB 20MB 15MB 8MB 8MB 8MB
Memory Support DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Quad Channel
DDR4-2400
Quad Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
DDR4-2400
Dual Channel
TDP 95 watts 95 watts 65 watts 140 watts 140 watts 91 watts 91 watts 91 watts
Price $499 $399 $329 $1050 $450 $350 $239 $309

All three of the currently announced Ryzen processors are 8-core, 16-thread designs, matching the Core i7-6900K from Intel in that regard. Though Intel does have a 10-core part branded for consumers, it comes in at a significantly higher price point (over $1500 still). The clock speeds of Ryzen are competitive with the Broadwell-E platform options though are clearly behind the curve when it comes the clock capabilities of Kaby Lake and Skylake. With admittedly lower IPC than Kaby Lake, Zen will struggle in any purely single threaded workload with as much as 500 MHz deficit in clock rate.

One interesting deviation from Intel's designs that Ryzen gets is a more granular boost capability. AMD Ryzen CPUs will be able move between processor states in 25 MHz increments while Intel is currently limited to 100 MHz. If implemented correctly and effectively through SenseMI, this allows Ryzen to get 25-75 MHz of additional performance in a scenario where it was too thermally constrainted to hit the next 100 MHz step. 

DSC02636.jpg

XFR (Extended Frequency Range), supported on the Ryzen 7 1800X and 1700X (hence the "X"), "lifts the maximum Precision Boost frequency beyond ordinary limits in the presence of premium systems and processor cooling." The story goes, that if you have better than average cooling, the 1800X will be able to scale up to 4.1 GHz in some instances for some undetermined amount of time. The better the cooling, the longer it can operate in XFR. While this was originally pitched to us as a game-changing feature that bring extreme advantages to water cooling enthusiasts, it seems it was scaled back for the initial release. Only getting 100 MHz performance increase, in the best case result, seems a bit more like technology for technology's sake rather than offering new capabilities for consumers.

cpu2.jpg

Ryzen integrates a dual channel DDR4 memory controller with speeds up to 2400 MHz, matching what Intel can do on Kaby Lake. Broadwell-E has the advantage with a quad-channel controller but how useful that ends of being will be interesting to see as we step through our performance testing.

One area of interest is the TDP ratings. AMD and Intel have very different views on how this is calculated. Intel has made this the maximum power draw of the processor while AMD sees it as a target for thermal dissipation over time. This means that under stock settings the Core i7-7700K will not draw more than 91 watts and the Core i7-6900K will not draw more than 140 watts. And in our testing, they are well under those ratings most of the time, whenever AVX code is not being operated. AMD’s 95-watt rating on the Ryzen 1800X though will very often be exceed, and our power testing proves that out. The logic is that a cooler with a 95-watt rating and the behavior of thermal propagation give the cooling system time to catch up. (Interestingly, this is the philosophy Intel has taken with its Kaby Lake mobile processors.)

lisa-29.jpg

Obviously the most important line here for many of you is the price. The Core i7-6900K is the lowest priced 8C/16T option from Intel for consumers at $1050. The Ryzen R7 1800X has a sticker price less than half of that, at $499. The R7 1700X vs Core i7-6800K match is interesting as well, where the AMD CPU will sell for $399 versus $450 for the 6800K. However, the 6800K only has 6-cores and 12-threads, giving the Ryzen part an instead 25% boost in multi-threaded performance. The 7700K and R7 1700 battle will be interesting as well, with a 4-core difference in capability and a $30 price advantage to AMD.

Continue reading our review of the new AMD Ryzen 7 1800X processor!!

Author:
Subject: Processors
Manufacturer: AMD

What Makes Ryzen Tick

We have been exposed to details about the Zen architecture for the past several Hot Chips conventions as well as other points of information directly from AMD.  Zen was a clean sheet design that borrowed some of the best features from the Bulldozer and Jaguar architectures, as well as integrating many new ideas that had not been executed in AMD processors before.  The fusion of ideas from higher performance cores, lower power cores, and experience gained in APU/GPU design have all come together in a very impressive package that is the Ryzen CPU.

zen_01.jpg

It is well known that AMD brought back Jim Keller to head the CPU group after the slow downward spiral that AMD entered in CPU design.  While the Athlon 64 was a tremendous part for the time, the subsequent CPUs being offered by the company did not retain that leadership position.  The original Phenom had problems right off the bat and could not compete well with Intel’s latest dual and quad cores.  The Phenom II shored up their position a bit, but in the end could not keep pace with the products that Intel continued to introduce with their newly minted “tic-toc” cycle.  Bulldozer had issues  out of the gate and did not have performance numbers that were significantly greater than the previous generation “Thuban” 6 core Phenom II product, much less the latest Intel Sandy Bridge and Ivy Bridge products that it would compete with.

AMD attempted to stop the bleeding by iterating and evolving the Bulldozer architecture with Piledriver, Steamroller, and Excavator.  The final products based on this design arc seemed to do fine for the markets they were aimed at, but certainly did not regain any marketshare with AMD’s shrinking desktop numbers.  No matter what AMD did, the base architecture just could not overcome some of the basic properties that impeded strong IPC performance.

52_perc_design_opt.png

The primary goal of this new architecture is to increase IPC to a level consistent to what Intel has to offer.  AMD aimed to increase IPC per clock by at least 40% over the previous Excavator core.  This is a pretty aggressive goal considering where AMD was with the Bulldozer architecture that was focused on good multi-threaded performance and high clock speeds.  AMD claims that it has in fact increased IPC by an impressive 54% from the previous Excavator based core.  Not only has AMD seemingly hit its performance goals, but it exceeded them.  AMD also plans on using the Zen architecture to power products from mobile products to the highest TDP parts offered.

 

The Zen Core

The basis for Ryzen are the CCX modules.  These modules contain four Zen cores along with 8 MB of shared L3 cache.  Each core has 64 KB of L1 I-cache and 32 KB of D-cache.  There is a total of 512 KB of L2 cache.  These caches are inclusive.  The L3 cache acts as a victim cache which partially copies what is in L1 and L2 caches.  AMD has improved the performance of their caches to a very large degree as compared to previous architectures.  The arrangement here allows the individual cores to quickly snoop any changes in the caches of the others for shared workloads.  So if a cache line is changed on one core, other cores requiring that data can quickly snoop into the shared L3 and read it.  Doing this allows the CPU doing the actual work to not be interrupted by cache read requests from other cores.

ccx.png

l2_cache.png

l3_cache.png

Each core can handle two threads, but unlike Bulldozer has a single integer core.  Bulldozer modules featured two integer units and a shared FPU/SIMD.  Zen gets rid of CMT for good and we have a single integer and FPU units for each core.  The core can address two threads by utilizing AMD’s version of SMT (symmetric multi-threading).  There is a primary thread that gets higher priority while the second thread has to wait until resources are freed up.  This works far better in the real world than in how I explained it as resources are constantly being shuffled about and the primary thread will not monopolize all resources within the core.

Click here to read more about AMD's Zen architecture in Ryzen!

Author:
Subject: Processors
Manufacturer: AMD

Get your brains ready

Just before the weekend, Josh and I got a chance to speak with David Kanter about the AMD Zen architecture and what it might mean for the Ryzen processor due out in less than a month. For those of you not familiar with David and his work, he is an analyst and consultant on processor architectrure and design through Real World Tech while also serving as a writer and analyst for the Microprocessor Report as part of the Linley Group. If you want to see a discussion forum that focuses on architecture at an incredibly detailed level, the Real World Tech forum will have you covered - it's an impressive place to learn.

zenpm-4.jpg

David was kind enough to spend an hour with us to talk about a recently-made-public report he wrote on Zen. It's definitely a discussion that dives into details most articles and stories on Zen don't broach, so be prepared to do some pausing and Googling phrases and technologies you may not be familiar with. Still, for any technology enthusiast that wants to get an expert's opinion on how Zen compares to Intel Skylake and how Ryzen might fare when its released this year, you won't want to miss it.

High Bandwidth Cache

Apart from AMD’s other new architecture due out in 2017, its Zen CPU design, there is no other product that has had as much build up and excitement surrounding it than its Vega GPU architecture. After the world learned that Polaris would be a mainstream-only design that was released as the Radeon RX 480, the focus for enthusiasts came straight to Vega. It’s been on the public facing roadmaps for years and signifies the company’s return to the world of high end GPUs, something they have been missing since the release of the Fury X in mid-2015.

slides-2.jpg

Let’s be clear: today does not mark the release of the Vega GPU or products based on Vega. In reality, we don’t even know enough to make highly educated guesses about the performance without more details on the specific implementations. That being said, the information released by AMD today is interesting and shows that Vega will be much more than simply an increase in shader count over Polaris. It reminds me a lot of the build to the Fiji GPU release, when the information and speculation about how HBM would affect power consumption, form factor and performance flourished. What we can hope for, and what AMD’s goal needs to be, is a cleaner and more consistent product release than how the Fury X turned out.

The Design Goals

AMD began its discussion about Vega last month by talking about the changes in the world of GPUs and how the data sets and workloads have evolved over the last decade. No longer are GPUs only worried about games, but instead they must address profession workloads, enterprise workloads, scientific workloads. Even more interestingly, as we have discussed the gap in CPU performance vs CPU memory bandwidth and the growing gap between them, AMD posits that the gap between memory capacity and GPU performance is a significant hurdle and limiter to performance and expansion. Game installs, professional graphics sets, and compute data sets continue to skyrocket. Game installs now are regularly over 50GB but compute workloads can exceed petabytes. Even as we saw GPU memory capacities increase from Megabytes to Gigabytes, reaching as high as 12GB in high end consumer products, AMD thinks there should be more.

slides-8.jpg

Coming from a company that chose to release a high-end product limited to 4GB of memory in 2015, it’s a noteworthy statement.

slides-11.jpg

The High Bandwidth Cache

Bold enough to claim a direct nomenclature change, Vega 10 will feature a HBM2 based high bandwidth cache (HBC) along with a new memory hierarchy to call it into play. This HBC will be a collection of memory on the GPU package just like we saw on Fiji with the first HBM implementation and will be measured in gigabytes. Why the move to calling it a cache will be covered below. (But can’t we call get behind the removal of the term “frame buffer”?) Interestingly, this HBC doesn’t have to be HBM2 and in fact I was told that you could expect to see other memory systems on lower cost products going forward; cards that integrate this new memory topology with GDDR5X or some equivalent seem assured.

slides-13.jpg

Continue reading our preview of the AMD Vega GPU Architecture!