Dissecting AMD Zen Architecture - Interview with David Kanter

Subject: Processors
Manufacturer: AMD

Get your brains ready

Just before the weekend, Josh and I got a chance to speak with David Kanter about the AMD Zen architecture and what it might mean for the Ryzen processor due out in less than a month. For those of you not familiar with David and his work, he is an analyst and consultant on processor architectrure and design through Real World Tech while also serving as a writer and analyst for the Microprocessor Report as part of the Linley Group. If you want to see a discussion forum that focuses on architecture at an incredibly detailed level, the Real World Tech forum will have you covered - it's an impressive place to learn.

View Full Size

David was kind enough to spend an hour with us to talk about a recently-made-public report he wrote on Zen. It's definitely a discussion that dives into details most articles and stories on Zen don't broach, so be prepared to do some pausing and Googling phrases and technologies you may not be familiar with. Still, for any technology enthusiast that wants to get an expert's opinion on how Zen compares to Intel Skylake and how Ryzen might fare when its released this year, you won't want to miss it.

Video News

February 13, 2017 | 01:22 PM - Posted by Anonymous (not verified)

Would be awesome if you guys threw the audio for these kinds of videos into your podcast feed. I don't know when I'll be able dedicate an hour to this if I can't listen while driving. Maybe even a "PcPer Extras" podcast feed; you guys do a lot of great interviews that I intend to listen to and then never find time for.

February 13, 2017 | 01:30 PM - Posted by Ryan Shrout

Not a bad idea!! We'll see what we can work out.

February 15, 2017 | 10:17 PM - Posted by Anonymous (not verified)

With youtube being blocked on many school's firewall's a podcast would be great.

February 13, 2017 | 02:06 PM - Posted by Tim Verry

That's a good idea!

February 13, 2017 | 05:03 PM - Posted by Anon (not verified)

Simply use a YouTube audio extractor

February 13, 2017 | 08:22 PM - Posted by Daniel Evans (not verified)

second the audio only stream for these rundowns

February 13, 2017 | 01:58 PM - Posted by TeraFloppy64 (not verified)


February 13, 2017 | 02:08 PM - Posted by Anonymous (not verified)

food for thought for BENCHMARKING try crisis 3 with titan x pascal in 1080p according to digitalfoundry only i7 can stay above 60fps not even i5 7600k 4.8ghz can run crisis 3...
and other idea try data compression software like 7z or winrar (lzma2 for 7z and rar5 for winrar) it is multithreading and use the cpu to the max...
great contant...

February 13, 2017 | 02:31 PM - Posted by Master Chen (not verified)

Wake me up when it's going to be an interview with Jim Keller.

February 13, 2017 | 03:03 PM - Posted by Anonymous (not verified)

Yes depending on when all the NDA's lift and even then Keller may still have limitations concerning any of AMD's future plans that Keller may know of after the Zen/Vega NDA's fully lift.

Kanter needs to do more towards sussing out AMD's Infinity Fabric IP that will be made use of across AMDs GPU and CPU/Other processor offerings. We know about Intel's ring buss CPU core to CPU core interconnect that has been around since SandyBridge. What we still do not Know about is the Zen/Ryzen CCX unit to CCX unit connection fabric topology IP, and how that affects coherency/Cache sharing between cores on different CCX units.

If AMD's Infinity fabric is going to be used for GPUs/other processors as well as CPUs, then will at some future time will a future AMD CPU core be able to dispatch FP/Int/other work directly to a GPU(Integrated or discrete) via that Infinity Fabric(IF) and will the IF manage coherency and other connection fabric duties CPU core to CPU core or CPU core to GPU/other processor cores on AMD’s systems.

Charlie D. over at Semiaccurate made some interesting comments about AMD’s Infinity Fabric in his article on that subject. And I’d love to see what Kanter’s assessment of AMD’s Infinity Fabric IP may entail.

February 14, 2017 | 04:10 AM - Posted by Master Chen (not verified)

I bet my friggin' ass that Jim Keller's face is a very epitome of the trollface right at this very moment. The dude's sitting at his home and LITERALLY be like "MUEHEHEHEHEHEHEHEH" while rubbing his hands in an evil genius-like manner...GENIUSSSS!

February 14, 2017 | 12:53 PM - Posted by Anonymous (not verified)

No Keller was working at lots of places and he is currently working at Tesla(Vice President of Autopilot Hardware Engineering). (1) He is most likely not at Home with any sort of trollface and probably is very involved in working to create a CPU/system with fail-safe features that rival those used in SpaceX rockets. Car autopilot hardware will be running billions of more hours than any rocket autopilot systems do on average. So a car's autopilot hardware will have to be that much more more error tolerant.

"Jim Keller was working at DEC until 1998, where he was involved in designing the Alpha 21164 and 21264 processors.[2][3] In 1998 he moved to AMD, where he worked to launch the AMD Athlon (K7) processor and was the lead architect of the AMD K8 microarchitecture,[16] which also included designing the x86-64 instruction set and HyperTransport interconnect mainly used for multiprocessor communications.[2]

In 1999, he left AMD to work at SiByte to design MIPS-based processors for 1 Gbit/s network interfaces and other devices.[3][11][17] In November 2000, SiByte was acquired by Broadcom,[18] where he continued as chief architect[8] until 2004.[2]

In 2004 he moved to serve as the Vice President of Engineering at P.A. Semi,[2][10] a company specializing in low-power mobile processors.[3] P.A. Semi was acquired by Apple in 2008, and Keller followed,[5][16] becoming part of a team to design the Apple A4 and A5 system-on-a-chip mobile processors. These processors were used in several Apple products, including iPhone 4, 4S, iPad and iPad 2.

In August 2012, Jim Keller returned to AMD, where his primary task was to design a new generation microarchitecture[4][10][14] called Zen.[13] After years of being unable to compete with Intel in the high-end CPU market, the new generation of Zen processors is hoped to restore AMD's position in the high-end x86-64 processor market.[2][12] On September 18, 2015, Keller departed from AMD to pursue other opportunities, ending his three-year employment at AMD.[19]

In January 2016, Keller joined Tesla Motors as Vice President of Autopilot Hardware Engineering.[1]" (1)


Jim Keller (engineer)

February 14, 2017 | 03:51 PM - Posted by Master Chen (not verified)

You really don't get it.

February 14, 2017 | 05:15 PM - Posted by Anonymous (not verified)

No! I get that you don't have any brain cell remaining after huffing all that Toluene! And that single brain cell that you were born with has shuffled off to buffalo after swimming so much C6H5CH3!

February 16, 2017 | 05:14 AM - Posted by Master Chen (not verified)

"It's easy! I just utterly destroy their shit. They're going to feel that for the next five years." (c) Jim Keller

February 13, 2017 | 02:37 PM - Posted by Anonymous (not verified)

It's too early to be giving Intel's crystalwell IP an advantage for integrated graphics with its eDRAM(Limited Usage) until the Vega IP arrives and more information is available as to just exactly what that HBCC(High Bandwidth Cache Controller) IP entails. AMD's integrated graphics at 28nm will not be the same as any Vega based integrated graphics at 14nm with respect to the increased CU/NCU(Next Compute Unit/Vega), or total shader/other, counts that AMD can support on an APU at 14nm.

AMD's Ryzen/Vega based APUs will be very different in addition to obvious advantages that AMD’s integrated graphics will have under a 14nm process node(More CUs/NCUs/Shader counts). We still do not Know if Vega will have any eDRAM types of technology at this time and even how having an HBM2 stack or two available to an interposer based APU’s GPU/Graphics will affect any new AMD APU SKUs.

AMD’s new APU designs may make use of HBM2 as a last cache level to a larger pool of regular DIMM based DRAM. AMD has announced that it will be making a line of workstation/server/HPC APUs on an interposer designs for the professional markets and it’s safe to assume that there will be consumer variants.

Even if AMD still made a monolithic Zen/Vega APU die that die could still be pared with a stack of HBM2 and not require such a large interposer to join up a monolithic CPU/GPU die to a single stack of HBM2. And any new APU designs with HBM2(Even a single stack of HBM2)will see Vega’s HBCC controller IP merged with AMD’s APU controller IP to make a very new APU design all around that would not have to depend on only the limited nature of the DIMM based DRAM/DRAM channel, as the HBM2 included with the APU would guarantee that the integrated graphics will never be starved of the necessary memory bandwidth by any OEM's decision to only use a single channel to DIMM based DRAM, or slower DIMM based DRAM for that matter.

It’s also of note that Intel’s crystalwell graphics are not widely in use across all of Intel’s SOC offerings and that when crystalwell is made use of it’s only on the most expensive Intel SOC SKUs. AMD’s graphics on its APUs at 28nm have always been a Price/Performance leader while Intel’s top end graphics, that Intel likes to make reference to for performance/publicity reasons, is mostly limited in actual usage on many OEM products because of its cost.

AMD’s APUs of any design(monolithic single die APU design or Interposer based APU design) are going to be more robust simply by virtue of the 14nm node that will allow for integrated graphics with higher CU/NCU/Shader resources than at 28nm. AMDs is now using 14nm for all Zen/Vega production, and Polaris before Vega on the GPU side.

February 13, 2017 | 03:41 PM - Posted by Shane O Laake (not verified)

Wonderful video

February 13, 2017 | 05:36 PM - Posted by ppi (not verified)

While this may be off-the-mark ... if I have AVX-heavy code, how well would it run on GPU?

February 13, 2017 | 07:54 PM - Posted by Anonymous (not verified)

Talking about the one time jump of 40% IPC. Recent speculation is starting to confirm the licencing of AMD GPU tech to Intel. I think this is part of a longer plan to give heterogeneous computing some traction. We are coming down to the brick wall nano-meter wise and they are looking for new ways to get more processing power. In my mind this is a solution to what happens to when you cant throw anymore cores at something. Do groups of cores evolve into something else entirely as a compute unit? Rambling a little. Theory on where to go next.

February 13, 2017 | 09:31 PM - Posted by Anonymous (not verified)

Intel is not getting any licensing for any of, or a look at any of, AMD's actual on the market working complete GPU designs! And that GPU/graphics design that Intel developed by itself for itself needs to have some of its IP licensed as Intel could not build a modern GPU without INFRINGING on an AMD or Nvidia patent.

Intel designed, and will continue to design, its own graphics but it is impossible to design a modern GPU without licensing basic GPU IP from AMD or Nvidia for such things as unified shaders and other basic GPU IP functionality that both AMD and Nvidia have the IP rights to.

So Intel in designing its own GPUs needs to license parts of the Basic GPU IP held by either AMD or Nvidia. Intel is not getting any look see at any of Nvidia’s or AMD’s GPU engineering plans. Intel is only getting the rights to design its own GPU/Graphics that make use of some of the basic PATENTED GPU design outlines that both AMD and Nvidia have the patent portfolios for!

One can design their own GPU from scratch, but then that design has to be vetted by the patent engineers and patent lawyers to see if the design infringes on any patents that others may have. Most likely Intel found out very early in the process that it had to have some form of IP license from either Nvidia or AMD(ATI at the time of Intel’s GPU project), as those two companies represent the largest block of the basic GPU/IP patents in existence. Intel will never get to look at any Nvidia or AMD engineering blueprints for actual working GPUs.

Intel's current/past designs have to have some of their features licensed from either AMD or Nvidia OR Intel will not be able to sell even the GPUs that Intel has desigened, both current and past designs, as Intel has always made use of features/patents used in its GPU designs that both AMD and Nvidia have the rights for.

So if Intel stops licensing the basic GPU IP from Nvidia it has to go to AMD and get a license in order to continue to have Intel's GPU/Graphics designs legal. Intel's current and past in house GPU/graphics designs have some parts that require licensing and either Nvidia or AMD have the overlapping rights to the basic/similar GPU patents that Intel needs/uses.

Nvidia and AMD would never let Intel license any of Nvidia's or AMD's actual working most up to date full GPU designs! It's just some basic limited GPU IP that Nvidia currently licenses to Intel and it will be the same for AMD with some limited basic GPU IP licensed to Intel. Intel's GPU designs would not be legal for sale without some very basic GPU IP licensing from EITHER Nvidia OR AMD.

February 13, 2017 | 09:21 PM - Posted by Anonymous (not verified)

Iris Pro's Crystal Well is huge. Makes for expensive APUs.

But guess what? HBM2 APU = huge bandwidth, huge performance. Certainly for a niche, it will be awesome.

February 13, 2017 | 10:20 PM - Posted by Anonymous (not verified)

A single stack of HBM2 4GB, or even better the 8GB variant of HBM2, would be enough for a laptop APU SKU. The rest of the memory could be a larger amount of regular, single or dual channel, DIMM based DDR4. One stack of HBM2 has a 1024 bit interface and that's way wider than any dual channels to regular DIMM based DRAM. So a single stack of HBM2 provides 256GB of available bandwidth! That’s enough effective bandwidth to feed a sizable APU based integrated GPU with probably 16 CUs/NCUs(Vega) and maybe a little larger.

So before HBM’s/HBM2’s introduction AMD's APUs at 28nm only had at maximum GPU with 8 CUs and the 8 CUs where still starved for bandwidth running from DIMM based DRAM. So at 14nm a single stack of HBM2, Maybe acting like an HBM2 cache to a larger amount of slower DDR4 DIMM based DRAM, would be enough to provide the bandwidth needs of double the amount of CUs/NCUs(Vega) and allow the integrated graphics to operate mostly from the HBM2 and not the slower DIMM based DDR4. And it would not even matter if a Laptop OEM only provided a single channel to regular DDR4 based DRAM as the integrated graphics would be running from HBM2 memory and not slower DIMM based DDR4, or slower memory.

The HBCC(High Bandwidth Cache Controller) IP on Vega appears to be ready made for Leveraging HBM2 like a last level of cache while also hiding and latency/bandwidth deficiencies of any slower DIMM based DRAM DDR4/Slower tier memory accesses from the APU's integrated graphics and there is even Vega’s HBCC/memory controller IP for handling its own virtual memory paging files for Texture Memory paged to disk(paging swap file).

The complete information on VEGA's HBCC has yet to be revealed by AMD/RTG so maybe on Feb 28 at the Ryzen and GDC 2017 Capsaicin & Cream Event there will be more info.

February 15, 2017 | 04:24 AM - Posted by Anonymous (not verified)

HBM based APUs would be really expensive also. They may make their way into high end mobile devices. I could see Apple being interested in them. Cheaper devices may have to wait for stacked memory that can just be placed on the same package as the APU, rather than needing an expensive silicon interposer. Either company could use stacked memory with a slower interface though. It would be similar to intel's crystal well parts, they just could put a lot more memory with a stacked device. Even if it isn't as fast as HBM, it would probably still outperform system memory by a large margin. I doubt intel's GPUs will be that competitive regardless of what memory is integrated. We have been comparing AMD 28 nm parts with intel 14 nm parts for a while. Both will be at ~ 14 nm soon.

I could see a mobile APU with on-package stacked memory connected to an external DRAM channel for swap space. An external NVDIMM (preferably x-point) would be a very interesting solution also. Also, HBM is a JEDEC standard. I believe Intel already makes an HBM device from one of the FPGA companies they bought, so HBM isn't limited to AMD. Intel still doesn't have a powerful GPU, but they could be working on something though. They have the Xeon Phi which could be a powerful GPU with a bit of additional fixed function hardware. It probably would not be low power at all though. I don't think intel wants to be in the GPU market. It is lower margin. Intel gets thousands of dollars for server chips the size of a high end GPU. The high end GPUs, with a lot of high speed memory sell for less than many server CPUs. It will be interesting to see what direction they go in.

February 15, 2017 | 11:17 AM - Posted by Anonymous (not verified)

A consumer APU would only need a single stack of HBM2 with the remaining being Regular DDR4 DRAM/DIMMs out on the motherboard. And AMD really would not need to go much higher than a single stack of HBM2 at 8GB as for most laptop SKUs a single stack of HBM2 at 4GB would suffice to allow the integrated graphics to never be starved for bandwidth no matter what the Laptop OEM's could do like only providing a single channel to DIMM based DRAM. With AMD, Nvidia, and Intel(New AI processor) all using HBM2 the price will come down at some point and an APU with a single stack of HBM2 would be less expensive than any other HBM2 based configuration.

Also with HBM2/HBM being JEDEC standard there is plenty of room for AMD and others to maybe get some amended standard and for example create a single stack of HBM2/amended standard. And that could have more than a 1024 bit interface in 256/128 bit increments for a single stack of HBM2 with say 1024 to 2048 bits of interface/traces per single stack of HBM2 for more bandwidth for any integrated APU graphics that may need more bandwidth or even to provide the same effective bandwidth with more traces run at lower clocks to save power and thermal headroom for thinner laptop APU SKUs. HBM2 stacks have a larger footprint(X and Y) than HBM did so a single stack of HBM2 has more area to host more micro-bumps for a wider than 1024 bit interface if needed to single stack of HBM2. It all really depends on the needs on the APU’s integrated graphics, that and thermal constraints for some laptop form factors.

At 14nm regardless of what is used HBM2 or DIMM based DRAM, AMD is going to be able to fit more CUs/NCUs(Vega) on its Zen/Vega APU SKUs. So that process node shrink alone will allow for the use of more CUs/NCUs on AMD’s integrated graphics. A single stack of HBM2 is going to be more necessary for any integrated APU graphics that would not be starved for bandwidth. That lack of bandwidth would be a problem for higher CU/NCU count APUs that have to rely on DIMM based DRAM and/or any Laptop OEM’s gimping to a single DRAM DIMM channel on their laptop SKUs.

February 14, 2017 | 11:38 AM - Posted by Pholostan

Good one, I love it when you do stuff like this :-)

February 15, 2017 | 05:12 AM - Posted by Anonymous (not verified)

With process technology hitting a wall, I would expect everyone else to be catching up to intel. The front runner will hit the wall first. I also think that there will be quite a few companies capable of designing a high performance CPU going forward. AMD's excavator CPUs were actually quite good if you consider that they were stuck on 28 nm. They often still offered good performance for the price. They have just been way to far behind for a while though. There is no way a 28 nm processor is going to compete with 14 nm processor.

The only thing that worried me ahout Zen was that they are using a more gpu optimized process technology / design libraries. This is skewed more towards density than clock speed. At this point, the clocks seem to be quite high though. It may be the case that current CPUs are heat limited before they hit other clock scaling issues. In that case, using GPU optimized, high density design libraries may be a more optimal methodology. It allows more transistors to be thrown at the problem, so IPC can be higher and clocks lower. It may be an excellent fit for mobile also. I am not surprised at all that they could achieve 40% higher IPC. If you consider the massive difference in transistor budgets between 28 nm and 14 nm, it would be surprising if they didn't get a massive jump in IPC.

I doubt that AMD's caches will be as fast as Intel's though, but they may make up for this by the providing larger L2 caches. Modern CPUs are memory bound most of the time. A huge amount of die area is taken up by memory hierarchy. You are almost buying more of a memory chip than processing chip these days. Some of the old Core 2 Quad (45 nm) processors can still perform very well since some of them were two dual core processors with 6 MB L2 each for a total of 12 MB on die. It will be interesting to see how Zen caches compare. The larger L2 may make a big difference with SMT enabled. Intel processors have very small caches for running two threads per core.

February 15, 2017 | 12:06 PM - Posted by Anonymous (not verified)

That and a larger number of execution ports, Just look at the Power8's design with lots execution ports for more FP/INT and load and load/store units, VMX units that can also do regular FP workloads when needed, more eDRAM etc. The power 8 can issue 10 instructions per clock with decoding for 8 Instructions per clock. The power8 services 8 processor threads per core.

So larger L2 caches and more execution resources to service the processor threads. Larger L2 caches increase the hit rate and larger L3 caches store more hot data/code for less latency inducing regular memory accesses.

There is no evidence that any Zen/Naples server or Zen/Ryzen desktop SKUs are made using any High Density Design Libraries, but AMD's laptop APU SKUs could make use of that for higher density low power usage at the cost of having lower top end clock speeds. That said why use any standard CPU low density design libraries if the APU SKUs can't be clocked higher regardless because of thermal headroom envelopes on laptop form factors. Use the High Density Design libraries for APUs and get 30% extra die space savings on top of the space savings at 14nm and use that space saved more CUs/NCUs(Vega) on the APU’s integrated graphics.

February 16, 2017 | 06:18 PM - Posted by Anonymous (not verified)

Video discussion was excellent. Please do a follow up after the Zen launch.

February 17, 2017 | 10:39 AM - Posted by drbaltazar (not verified)

Any know if these will use the trick s and google use?
The way i think it work is they use 16 bit 360 p to generate initial image then when they reload the iage (or whatever it is they do . Its reloaded in the full resolution
My lumia 640 does this and it kind of unblur .its a bit slow but then this phone only has 1 gb of ram

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

This question is for testing whether you are a human visitor and to prevent automated spam submissions.