AMD Details Zen at ISSCC

Subject: Processors | February 8, 2017 - 09:38 PM |
Tagged: Zen, Skylake, Samsung, ryzen, kaby lake, ISSCC, Intel, GLOBALFOUNDRIES, amd, AM4, 14 nm FinFET

Yesterday EE Times posted some interesting information that they had gleaned at ISSCC.  AMD released a paper describing the design process and advances they were able to achieve with the Zen architecture manufactured on Samsung’s/GF’s 14nm FinFETT process.  AMD went over some of the basic measurements at the transistor scale and how it compares to what Intel currently has on their latest 14nm process.

View Full Size

The first thing that jumps out is that AMD claimes that their 4 core/8 thread x86 core is about 10% smaller than what Intel has with one of their latest CPUs.  We assume it is either Kaby Lake or Skylake.  AMD did not exactly go over exactly what they were counting when looking at the cores because there are some significant differences between the two architectures.  We are not sure if that 44mm sq. figure includes the L3 cache or the L2 caches.  My guess is that it probably includes L2 cache but not L3.  I could be easily wrong here.

Going down the table we see that AMD and Samsung/GF are able to get their SRAM sizes down smaller than what Intel is able to do.  AMD has double the amount of L2 cache per core, but it is only about 60% larger than Intel’s 256 KB L2.  AMD also has a much smaller L3 cache as well than Intel.  Both are 8 MB units but AMD comes in at 16 mm sq. while Intel is at 19.1 mm sq.  There will be differences in how AMD and Intel set up these caches, and until we see L3 performance comparisons we cannot assume too much.

View Full Size

(Image courtesy of ISSCC)

In some of the basic measurements of the different processes we see that Intel has advantages throughout.  This is not surprising as Intel has been well known to push process technology beyond what others are able to do.  In theory their products will have denser logic throughout, including the SRAM cells.  When looking at this information we wonder how AMD has been able to make their cores and caches smaller.  Part of that is due to the likely setup of cache control and access.

One of the most likely culprits of this smaller size is that the less advanced FPU/SSE/AVX units that AMD has in Zen.  They support AVX-256, but it has to be done in double the cycles.  They can do single cycle AVX-128, but Intel’s throughput is much higher than what AMD can achieve.  AVX is not the end-all, be-all but it is gaining in importance in high performance computing and editing applications.  David Kanter in his article covering the architecture explicitly said that AMD made this decision to lower the die size and power constraints for this product.

Ryzen will undoubtedly be a pretty large chip overall once both modules and 16 MB of L3 cache are put together.  My guess would be in the 220 mm sq. range, but again that is only a guess once all is said and done (northbridge, southbridge, PCI-E controllers, etc.).  What is perhaps most interesting of it all is that AMD has a part that on the surface is very close to the Broadwell-E based Intel i7 chips.  The i7-6900K runs at 3.2 to 3.7 GHz, features 8 cores and 16 threads, and around 20 MB of L2/L3 cache.  AMD’s top end looks to run at 3.6 GHz, features the same number of cores and threads, and has 20 MB of L2/L3 cache.  The Intel part is rated at 140 watts TDP while the AMD part will have a max of 95 watts TDP.

If Ryzen is truly competitive in this top end space (with a price to undercut Intel, yet not destroy their own margins) then AMD is going to be in a good position for the rest of this year.  We will find out exactly what is coming our way next month, but all indications point to Ryzen being competitive in overall performance while being able to undercut Intel in TDPs for comparable cores/threads.  We are counting down the days...

Source: AMD

February 8, 2017 | 11:00 PM - Posted by Anonymous (not verified)

Some are saying that Zen will not be good for the HPC market but when you combine Zen with Vega then that’s the combination that will be great for HPC. look for the future workstation/Server/HPC APUs on an interposer and AMD will be sending any FP workloads to the Professional APU's GPU. David Kanter is too focused on the traditional CPU way of HPC compute and ignoring AMDs APUs on an Interposer technology that will be coming online 2h 2017 and 2018.

One need only look at AMD’s exascale grant proposal for a 32 Zen core APU joined to a big Vega die and HBM2 on an interposer to see the direction that AMD is going.

February 9, 2017 | 12:08 AM - Posted by Anonymous (not verified)

Very underwhelming specs. I was thinking they would implement MORE actual cores, more than the 8350. This has FEWER cores. They're copying Intel for some reason. Needs more cores, faster clock speed and way more cache memory. That would be the winner. Forget any integrated video. Just go for the most important factors.

February 9, 2017 | 12:33 AM - Posted by changeofspace

This is comparison that AMD can produce cores and caches similar to Intel but smaller. The initial Ryzen skus will go up to 8 cores/16 threads.

February 9, 2017 | 01:29 AM - Posted by Pixy Misa (not verified)

Zen cores are arranged in groups, with four cores + 4 x L2 caches + 8MB shared L3 cache per group. That's not like Bulldozer - the cores are standalone, this is just the chip layout.

Summit Ridge will have two CPU groups, so up to 8 cores (16 threads) and 16MB L3 cache. It will be a lot faster than the 8350.

Raven Ridge will have one CPU group (four cores/eight threads and 8MB L3), plus 1024 Vega cores, plus probably 2GB of HBM2 cache.

February 9, 2017 | 09:56 AM - Posted by Anonymous (not verified)

Each CCX unit comprises 4 full fat Zen cores(wide order superscalar design with SMT capabilitie) that only share the L3 cache between the 4 cores with each core in the 4 core CCX unit able to access every cache in the CCX unit with same average latency. Each Zen core has a its own dedicated L2 cache of 512 KB, and four Zen cores share 8 MB L3 cache.

What is missing from any full Zen Review is any information about the CCX to CCX connection fabric IP that AMD is using with some tentative information pointing to the AMD Infinity fabric IP but without any complete information on the interconnect’s topology.

February 9, 2017 | 01:54 PM - Posted by Tim Verry

Yeah AMD has so far managed to keep that under wraps heh 

February 9, 2017 | 01:40 AM - Posted by Anonymous (not verified)

LMFAO underwhelming, wow you are a putz.

Ryzen does NOT have less cores, where do you get that from? The AMD FX-8350 has 4 modules with 2 cores in each module, with CMT can dispatch a total of 8 threads (TOTAL if all "cores" and all "modules" were to be loaded) but NOT efficiently either as the cores in many cases are "starved" from L2/L3 (slow as well as very high latency cause they prioritized "speed" not efficency) and having to dispatch those threads.

(FX8 and 9 series) are not "native" 2-4-8 core. not to mention, much higher power use on FX8 and 9 with 125w-225w+ compared to 95w MAX for Ryzen (power and TDP which is the cooling required)

Ryzen on the other hand is a TRUE X amount of cores, so a Ryzen 8 will be 8 cores that can handle 16 threads, not 4 cores that in best case scenario handles 8 threads MAX, also Ryzen Cache is much more efficiently designed.

Ryzen is far far more robust and intelligent a design then ANY previous AMD designs (it was them learning from everything they did WRONG with (bulldozer/piledriver/excavator/steamroller derived processors) they stuff much more under the hood, all while using less power and doing far more "work" for the clock speed they are rated at, much like say old Pentium 4 running at 3.5Ghz vs a modern Core i chip running at 3.5 Ghz, there is NO comparison.

As far as total cache amount, Ryzen is far beefier, as far as clock for clock performance, Ryzen will win hands down over FX 8 and 9 series, as far as power draw, Ryzen once again wins.

So do a wee bit of research before spout nonsense crap. CMT DID NOT WORK the way AMD wants/wanted it too, SMT in the way you put it "following in intel's footsteps" DOES WORK, as does AMD Hypertransport very very well in fact.

Anyways, my point being, considering most applications we use in a consumer desktop environment at this point only really can use "4 cores/4 thread" with them doing 8 cores 16 threads is more than enough, as far as raw clock speed, if you read a bit, you would understand Ryzen will scale power usage as well as its turbo clocks according to how much is actually needed AND can downclock far far quicker and in smaller steps then Intel does, not to mention the IPC of Ryzen accomplishes more FASTER than any previous designs AMD has EVER made.

So yeh, read a bit more, then form a valid opinion, Copy what works and make it work better, AMD is not doing any different then any company that has ever existed, Integrated Video is a MASSIVE part of consumer cpu products, PERIOD, that is where AMD makes and is the leader in APU(cpu/gpu combined) for everyone else, there is Ryzen (which is JUST) the cpu no bloody different then it has been for years now, or were you asleep?

How massive do you want the chip, how much would you be willing to pay for something 2-4 times larger in cost, the motherboard, the power and cooing that would be needed?

you want more cores and a ton of cache you will likely never need, go workstation, 32x2 cores 64x2 threads, AMD will deliver that, prepare to pay for it, for consumer use, there is NO point in this many cores, nor can you expect them to run (at this point) 3.5Ghz+ and not require a crap ton of power and cooling and cost a crap ton.

Anyways am done now, use your head and do some research, the answers are there, but your post on the other hand, it just makes A WASTE OF SPACE.

February 9, 2017 | 10:31 AM - Posted by Anonymous (not verified)

An 8 core Ryzen with 16 processor threads is going to be great for gaming and running any OS/Services/spyware 10 CPU cycles stealing bloat on a few cores/threads while having maybe at least 6 cores that can be fully available for gaming. The more cores the better it will be for some that run more than one OS in a virtual machine(VM) environment/instance. So with 8 cores/16 processor threads some can have a Linux based hypervisor running both Linux and windows OSs in their own separate VM instances with the user able game mostly on their Linux OS after 2020 with a windows 7 instance locked down from any internet access or limited internet access and windows 7 used for any legacy gaming not fully supported under Linux.

February 9, 2017 | 11:24 AM - Posted by Mike S. (not verified)

To defend AMD for adopting Intel design concepts: AMD has been struggling to be profitable and Intel has been raking in billions in profit for years. It only makes sense for AMD to borrow all of the best concepts from Intel designs and try to improve upon them. Maybe if they had billions more to invest, they could have found a way to make CMT competitive. But they didn't, and CMT as it exists in Bulldozer/Excavator was behind.

But mostly, all we can do is wait until the chips are released and watch the third party reviews. Intel takes advantage of its market position to price high. I'm hoping AMD either meets them head on and I can get 100% of Intel performance for 80-90% of the cost, or else AMD can't match them but I can get 90% of Intel performance for 60% of the cost. Any better price improvement over Intel would be a bonus.

February 9, 2017 | 02:12 PM - Posted by Anonymous (not verified)

I really hope that you do not think that Intel invented SMT(What Intel's marketing has name obfuscated to HyperThreading) to make the unwashed masses think that SMT technology was actually invented by Intel. It would be good idea to also read up on the Modified Harvard Microprocessor Architecture that most microprocessor CPU cores today are based upon to see where the IP/Technology really comes from.

Simultaneous multithreading (SMT) was not invented at Intel as this Wikipedia entry states:

"While multithreading CPUs have been around since the 1950s, simultaneous multithreading was first researched by IBM in 1968 as part of the ACS-360 project.[1] The first major commercial microprocessor developed with SMT was the Alpha 21464 (EV8). This microprocessor was developed by DEC in coordination with Dean Tullsen of the University of California, San Diego, and Susan Eggers and Henry Levy of the University of Washington. The microprocessor was never released, since the Alpha line of microprocessors was discontinued shortly before HP acquired Compaq which had in turn acquired DEC. Dean Tullsen's work was also used to develop the Hyper-threading (Hyper-threading technology or HTT) versions of the Intel Pentium 4 microprocessors, such as the "Northwood" and "Prescott"." (1)

(1)[See: Historical implementations subheading]

https://en.wikipedia.org/wiki/Simultaneous_multithreading

February 11, 2017 | 07:17 PM - Posted by CNote

Didn't invent it but seems to be the first company to ever release anything a real person could use.

February 9, 2017 | 01:55 AM - Posted by John H (not verified)

Troll post.

February 9, 2017 | 05:45 AM - Posted by Anonymous (not verified)

TROLL POST???? Seriously???

A well reasoned and rational rebuttal of an obviously grossly misinformed post and you simply dismiss it as a Troll Post?

February 9, 2017 | 08:46 AM - Posted by John H (not verified)

I was replying to the original post not yours :)

February 9, 2017 | 01:56 AM - Posted by John H (not verified)

I am curious what the associativity of the RyZen caches are vs. Intel's..

February 9, 2017 | 10:15 AM - Posted by Anonymous (not verified)

Go read David Kanter's Microprocessor Report article as it goes into great detail about Ryzen's cache subsystems and their revelent associativity.

I can not beleve you do not read the Microprocessor Report as that is one of the preeminent trade journals on the subject of Micorporcessor technology for 40+ years.

There was a link to the article posted at PCPer(forum Post) but it was removed, and it is rare to be able to read online any of the Microprocessor Report's articles online as it is a pay-walled publication. There is a link provided at the TechReport's article on the same subject as this PCPer article you can find the link there as the link is posted in one the article's forum posts!

February 9, 2017 | 10:18 AM - Posted by Anonymous (not verified)

edit: revelent

to: relevant

February 9, 2017 | 11:36 AM - Posted by Anonymous (not verified)

That info is outdated. This analysis by Hiroshige Goto for PCWatch has the latest from ISSCC.

https://translate.google.com/translate?sl=auto&tl=en&js=y&prev=_t&hl=en&...

February 9, 2017 | 01:50 PM - Posted by Anonymous (not verified)

No the stuff on the Microprocessor Report is not outdated it's just that most of the non pay-walled hardware review websites are under NDAs or other things not spoken of if the review websites want to still get their review samples!

So if you care to see the information that you think recent is actually been out at the pay-walled publications for some time if you care to look there in the first place.

Even the Microprocessor Report is not going to be privy to all the relevant information as any CPU producer always keeps certain information hidden but that will be sussed out later when actual production chip samples can be obtained and prepped and examined under the use of an electron microscope and other useful methods and imaging hardware/software regimens. There are companies that specialize in such work for a price that everybody and his dog in the processor industry makes use of, if for only to see if any IP/Patents have been infringed upon!

February 9, 2017 | 11:21 PM - Posted by Anonymous (not verified)

Yes it is outdated. Goto's article has information not seen in Kanter's article. /

February 10, 2017 | 01:05 AM - Posted by Anonymous (not verified)

NO! Kanter's article is not outdated just because goto's article has information not included by Kantor's. It's very likely that there are other Microprocessor Report articles that have the omitted information but still what Kanter reveals in one article does not represent all of the information that he has at his disposal. Your reasoning is daft at best, and you should know that The Zen microarchitecture design was frozen a good while before Kanter's article was published. The Microprocessor Report has been around for 30 years and is a respected Professional trade journal.

You do not appear to realize that most of the real information that you think is new to the public in a free online manner is actually not new! Most of the pay-walled publications have access to the industry connections to get that information long before there is any free online dissemination of information.

P.S. most colleges and universities with computing sciences departments have in their libraries bound copies of the Microprocessor Report or online access via their library’s subscription portal to the various professional and academic trade journals that have the relevant information long before it shows up online in a free to read format.

February 9, 2017 | 05:26 AM - Posted by dgrdsv (not verified)

i7-6950K is a 10C/20T CPU. Current Intel 8C/16T is 6900K which runs at 3,2GHz with 3,7GHz/4,0GHz boost/MAX boost.

February 9, 2017 | 10:30 AM - Posted by Josh Walrath

Fixed, thanks for pointing that out.  My editor sucks for missing that!

February 9, 2017 | 11:17 AM - Posted by Anonymous (not verified)

tit for tat who cares.. if you dont set your own clock speed you're an idiot.

February 9, 2017 | 02:07 PM - Posted by Jeremy Hellstrom

... said every IT department?

February 9, 2017 | 09:24 AM - Posted by Anonymous (not verified)

Looking good!

Take my money already!!!

February 9, 2017 | 12:18 PM - Posted by Anonymous (not verified)

The 44 square mm probably does include L3. I tried to figure the due area for an intel core from a die photo and I came out around 12 square mm for as single core including L3 slice. That comes out pretty close to the 49 listed here. All of the other un-core stuff takes up quite a lot of space. There is the dual channel memory controller and all of the pci-e interfaces and such. It doesn't seem like the high density libraries would effect cache size that much, but there actually is quite a bit of control logic associated with the caches, especially with the L3 also being the communication point between 4 cores. In AMD's design, it also should include communication fabric to talk to other clusters. This isn't very important for the consumer space since consumer level devices will only be 1 or 2 groups of 4. It is s bit more important for 16 core chips, the device they have talked about with 2 16-core chips on one package.

February 9, 2017 | 12:20 PM - Posted by Anonymous (not verified)

The AVX throughput doesn't seem like that big of an issue. If you have code that can take advantage of 256 bit vector units, then you are probably better off writing it to run on the GPU anyway.

February 9, 2017 | 01:26 PM - Posted by Particle (not verified)

Something doesn't add up with that SRAM cell size. A giant 400 mm^2 die would barely be over half a kilobyte of SRAM cells at that density.

February 10, 2017 | 03:11 AM - Posted by Anonymous (not verified)

Looks like it should be um (micrometers) rather than mm. In um, that would be about 283 nm on a side, which sounds about right. 283 nm x 283 nm = 80,089 square nm. 80,000 square nm divided 1,000,000 square nm per square um is 0.08 square um.