Review Index:
Feedback

AMD EPYC 7000 Series Data Center Processor Launch - Gunning for Xeon

Author: Ryan Shrout
Subject: Processors
Manufacturer: AMD

Architectural Outlook

The basic Zen architecture found in the Ryzen processor, that we have discussed and debated many times on PC Perspective in the past, remains unchanged for EPYC. The core was designed with server and data center placement in mind, according to AMD’s briefing I took part in yesterday. There are a lot of discussions and accusations from outsiders that have called the AMD EPYC processor simply a “glued together” Ryzen CPU, attempting to construe this platform as a desktop part simply repurposed to act in the server environment.  Despite the fact that it really doesn’t matter what the pedigree of the architecture was, only how it performs in the necessary workloads and environments, AMD did give us more detail on the die to die and socket to socket communications to counter.

What AMD calls Infinity Fabric is actually a collection of interfaces that range in use from intra-die to inter-die to inter-socket. While the specific details of how these may be similar, or different, are still currently not detailed, we do now know the performance specifications between these to help judge the capability they offer.

View Full Size

AMD EPYC, and the upcoming Threadripper consumer HEDT parts, are multi-die packages. EPYC will have four dies on each CPU package, regardless of the number of enabled cores. AMD is disabling cores in a symmetrical pattern, so a 32-core part will have four dies with all 8 cores enabled on all four. A 24-core processor will have 1 core of each CCX, and thus 2-cores per die, disabled. A 16-core part will have 2 cores of each CCX disabled, 4 per core. And the 8-core part will have 3 of the 4 cores per CCX disabled, leaving only per core, and 2 per die.

I am still waiting for input from AMD on this, but it does bring up concerns of the L3 / thread-to-thread latencies of the architecture that we have discussed in the past. In the worst-case scenario, the 8-core design, there would only be one core per CCX which would require all inter-core communication to happen through the L3, maximizing latency. Couple that with the unknown quantity that is latency from die to die (or even socket to socket) and you have an interesting comparison point between platforms to dive into. I am working to get more information on this, as well as hardware to test and compare.

What AMD has shared to this point is some impressive bandwidth numbers to help alleviate any concern with the multi-die package implementation.

View Full Size

AMD has built low power, low latency links between the cores that offer as much as 42 GB/s of bi-directional bandwidth at extremely low power. Every die is connected to every other die, enabling single hop data travel between any two. They can run at extremely low power states when the cross traffic is minimal, keeping TDPs of the processors low.

View Full Size

When we look at the socket to socket bandwidth and connection diagram, each die is connected to the matching peer die on the other socket with a 38 GB/s bi-directional link. That gives EPYC at most two hops of latency to traverse between any two cores on the system, with a total aggregate bandwidth availability of 152 GB/s.

View Full Size

The IO and connectivity portion of the processor supports eight x16 PCIe links, getting us to the magic 128 lane number. Each die supports two of them, though one is used for the socket to socket interface in a 2P system, leaving each CPU with 64 lanes of PCIe, a total of 128 for the system. Each of the links supports 32 GB/s of bandwidth and 256 GB/s per socket – a substantial amount of potential throughput. Those connections can be divided into as many as 8 PCIe devices per x16 link, totaling 64 possible x1 PCIe connections. How about that for a coin mining server?

View Full Size

When you put it all together, it might look a bit messy, but the bandwidth and connectivity is there to make EPYC a powerful server processor and platform. The only question that remains unanswered for me is the “low latency” part of this slide that hasn’t been quantified…and that we haven’t tested yet.

AMD also claims that there are additional benefits to its multi-die design. For one, the amount of combined die space from these four chips is larger than a reticule can produce with today’s lithography technology. Essentially, AMD claims this product could not have been built as a monolithic die in its current configuration. This method does help AMD increase yields and thus better attack the high end of the server / workstation market with more products at the top of the stack. The flexibility to configure dies is clearly an advantage over a single, monolithic die.

A Performance Example

In the build up to today's release, AMD brought media to a room full of demos of EPYC at work, one of which stood out particularly to me. Using a single socket system, and access to the full 128 lanes of PCI Express from the EPYC 7601 processor, AMD was able to break world records for IOps using the FIO storage benchmark. Paired with the EPYC processor were 24 NVMe SSDs from Samsung, PM1725A in this case, all running with access to a full x4 of PCIe 3.0 bandwidth. At 3.2TB each, the total capacity of this software defined array was 76.8TB!

View Full Size

These numbers are incredibly impressive - 9.1M IOPS read, 7.1M IOPS write (both running at full random 4K), and 53.3 GB/s of storage bandwidth when run at 128K random! Even more impressive for the EPYC platform is that the server still has 32 lanes of PCI Express remaining for networking hardware, compute resources like accelerators, or more storage controllers. This is a clear example of how the massive amount of IO connectivity AMD has brought can change the data center TCO landscape.

Closing Thoughts, for now

AMD is in a significantly different space today than it was only 4-5 months ago with CPUs. It has gone from a lingering memory in the minds of gamers and DIY builders to a prominent player in the field. It has revamped interest in enthusiasts and OEMs like Dell for high-end gaming PCs and mainstream desktop builds. And today it prepares to make the same shift for the server and enterprise markets, launching the EPYC data center platform that returns competition to a market that has had a single major player for more than half a decade.

Based on the data I have seen, the products as they are described to me, and the ecosystem in the state that it currently resides, it’s hard to imagine AMD not being able to make significant headroom in this field. The definition of “significant” is going to vary depend on who you ask. Those that are wishing for the return of the Opteron peak will target a 20%+ market share as the necessary milestone. Intel might view even a couple percentage points of its highly profitable Xeon market as significant. To me, AMD management should probably be looking at a double-digit goal by 2020. That will shift the views and opinions of AMD from outsiders, help stabilize a financially taut corporation, and will open gates and allow for more customers to feel comfortable with the product line.

View Full Size

But let’s be clear, though it should be an easily attainable goal to gain market share where you have almost none, there are roadblocks. AMD needs to prove the product can perform and in more than just SPECint benchmarks. Considerable work is done by Intel on a yearly basis to optimized its hardware stack to meet the needs of the major platform players (think the Super 7). AMD needs to make a performance and a cost argument to these groups that will turn heads.

AMD must also avoid any potential platform pitfalls that plagued the Ryzen consumer launch. Data center customers have zero tolerance for that and mission critical systems and data need to be running at 100%.

Can they do it? Absolutely. Will they? Hopefully we’ll see more in the coming days and weeks to prove they are on the right track.


June 20, 2017 | 05:52 PM - Posted by willmore

Wow. I sort of wish I was still doing datacenter work.

June 21, 2017 | 12:50 AM - Posted by StephanS

32 core on 1s. 48 core next year.. maybe 64 core in 2020

June 21, 2017 | 03:58 AM - Posted by Hakuren

I know that EPYC is strictly server platform (SoC), but it seems that AMD are back and if TR really delivers the goods then I'm on course for my first AMD build since dinosaurs died out, which is quite a long time. LOL

Go AMD, force Intel to do innovation for a change instead slowing down progress, monopolizing and milking the market.

June 21, 2017 | 04:09 AM - Posted by malakudi

I am a bit disappointed, I was expecting higher clocks for 8 core parts, something to compete with Xeon E5-1680 v4 for example. Not everyone needs 16 or 32 cores in datacenter. Xeon E5-1660 v4 and E5-1680 v4 is very popular with our customers and AMD does not deliver anything comparable with their EPYC lineup. AMD seems to target E5-26XX and upper Xeon products.

June 22, 2017 | 12:31 AM - Posted by James

Really a Ryzen processor with ECC enabled would be suitable competition for a Xeon E5-1680 v4. They are both 8 core/16 thread, but Ryzen has dual channel memory instead of quad channel memory. There would be some market for dual die parts (like ThreadRipper), but I don't know if there has been any info about those from AMD. I don't know why they wouldn't sell some of them as workstation parts also, although it is unclear how they would be branded. AMD may want to create a wide separation between consumer level parts and Epyc.

June 26, 2017 | 02:31 PM - Posted by Paul A. Mitchell, B.A., M.S. (not verified)

Supermicro are planning a "DP Tower" workstation:

https://mma.prnewswire.com/media/525719/Super_Micro_Computer_AMD.jpg?w=1600

June 26, 2017 | 02:33 PM - Posted by Paul A. Mitchell, B.A., M.S. (not verified)

https://www.supermicro.nl/Aplus/motherboard/EPYC7000/H11DSi.cfm

June 21, 2017 | 06:45 AM - Posted by carlot

Hi Ryan,
because of the uniformity of i/o features (DRAM,PCIe,socket),
do you think it is possible for AMD to make EPYC "software upgradable" with a simple bios flash?
I mean , with the security processor embedded, you can conceive the ability do let the cores count be "software defined" instead using blown up fuses...

Thanks and all the best
Carlo

June 21, 2017 | 09:16 AM - Posted by msroadkill612

wow, i had the same thought minutes ago. pay per core~, & activate more for a fee as an upgrade path.

better than just zapping ok cores.

June 21, 2017 | 09:47 AM - Posted by StephanS

Yes. this is my #1 concern... AMD disabling 75% of the core for the entry $480 4 die EPYC seem like AMD its DESTROYING its die.

they pay Globalfoundry full price , and ruin the die to sell at at a deep discount... makes no sense.

June 21, 2017 | 03:02 PM - Posted by serpretetsky

I don't know the prices, but I wouldn't be surprised if the actual costs to produce a chip (after you've already paid R&D and FAB upgrade costs) are WAY below what they sell the chips for. If that is the case it makes sense to try to sell as many chips at highest prices possible. Sometimes that means you can only sell them for 200$ a pop for a lower end processor, sometimes that means you get to sell them for 2000$ a pop for higher end processors. Turning off certain cores is just a way to differentiate the prices more than anything (from a business point of view).

June 22, 2017 | 12:21 AM - Posted by James

All companies salvage dies with parts that are not fully functional. Intel used to only make 2 or 3 different Xeon dies and all others where salvaged with cores disabled. I would doubt that AMD are really getting 80% that are fully functuonal. There can be defects in cores, caches, and in the pci-e/interprocessor links. AMD's architecture should allow them to sell just about everything they make. The die with defects in the infinity fabric can be sold in the consumer market as Ryzen processors. Ryzen processors have a tiny number of HSIO links compared to the fully functional die. If it has some defective links, it may still be usable for a ThreadRipper part.

They obviously can sell parts with the full range of core counts. The parts that they would sell as an 8-core Epyc processor almost certainly have multiple defective cores, but fully functional interprocessor links. I am surprised that they are even going to be selling an 8 core Epyc processor since Ryzen is already 8-cores. The only reason to buy an 8-core Epyc processor would be if you need massive memory bandwidth or massive IO, but not much compute performance. There are some applications that might perform very well on it though. It will have a huge amount of cache per core and a huge amount of memory bandwidth, but thread to thread communication will have to be coarse grained to perform well. Stuff where the main system is ust sending data to the GPUs could be a candidate also. It would be very wasteful to sell a die with multiple bad cores, but fully functional HSIO links as a low end Ryzen part when it could be a low end Epyc part.

July 23, 2017 | 04:27 PM - Posted by Anonymouse (not verified)

I doubt they are getting an 80% yield too - the original Article stated that the RUMOR is that they got an 80% yield of 6 fully functional Cores.

There's a means to 'cheat' (work smart) by using decades old technology, sub-field Stitching; where similar pieces are joined to make a larger Die than what could normally be produced.

This is (to the limited extent that it is used) prevalent in Image Sensors, where the Equipment available may be small but a Full Frame or much larger Sensor is required.

Stitching Image: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2572992/bin/nihms29502f9.jpg

So you could visually examine the finished Wafer and see which Dies looked good and Stitch adjoining ones together, cut them apart wire them up with Infinity Fabric and then put four together to make one Epyc CPU - when that didn't work you could burn a Fuse and have Ryzen Pro, failing that Ryzen or recyclables.

That would possibly give you 80% yield of 'something' that "works" at some Frequency; probably not a yield of 80% 6 perfect Epyc Cores.

A 50% yield would almost double the cost (not double because you can SEE errors before too much expensive additional work is done) so the yield ought to be over 50% or it's going to be expensive.

June 21, 2017 | 09:08 AM - Posted by msroadkill612

"AMD is disabling cores in a symmetrical pattern, so a 32-core part will have four dies with all 8 cores enabled on all four. A 24-core processor will have 1 core of each CCX, and thus 2-cores per die, disabled. A 16-core part will have 2 cores of each CCX disabled, 4 per core. And the 8-core part will have 3 of the 4 cores per CCX disabled,"

hmm, if amd are getting 80% yields on their cores, it sounds they will be trashing a lot of ok cores.

June 21, 2017 | 09:56 AM - Posted by StephanS

If you look at the die shot of Zen, the 8 core dont even make 50% of the die.
Yet all EPYC have all the very sensitive un core fully functional.

Seem like AMD is paying globalfoundry full price, then go on to destroy the die, so they can sell it at a deep discount.

June 22, 2017 | 12:37 AM - Posted by James

It is possible to have a die with multiple defective cores, but fully functional interprocessor links. Without selling such die as low core count Epyc processors, they would have to be sold as very low end Ryzen parts due to the low number of functional cores.

June 24, 2017 | 03:54 AM - Posted by Mason (not verified)

Seem like AMD will be trashing a lot of ok cores and go on destroy the die! javascript obfuscator

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.