The first for AMD

AMD was on stage today at the first annual Global Technology Conference hosted by former AMD production arm GLOBALFOUNDRIES talking about their move to 32nm process technology and their partnership with the foundry. Most importantly though we saw the first public showing of the AMD Llano Fusion desktop platform APU!!
For the past several years AMD has been placing “Fusion” logos on all of their marketing presentations, and it all started after the announcement that AMD would be integrating ATI graphics technology onto their CPUs.  It was initially announced that we would have Fusion based parts in 2009, but it seems the laws of physics and die size got in the way.  Now as we are heading into 2011, we are actually going to see the fruits of the initially controversial merger of AMD and ATI.

Ontario is the codename
for the Bobcat based Fusion part which integrates a dual core processor with a DX11 based graphics portion.  This part was presented for the first time at Computex in early June, and AMD has ramped production on these parts for a late 2010 release to manufacturers.  Ontario is a slightly different bird than we were expecting.  While it was well known to be based on the new Bobcat processor architecture, it was assumed to be produced by GLOBALFOUNDRIES on their 32 nm SOI process.  It was a bit of a shock when it was announced that this part would in fact be initially produced by TSMC on their 40 nm bulk process.

The second product in the Fusion family is known as Llano.  This is going to be the wagon that AMD really will hitch their desktop aspirations to.  It is a design that is based on compromise, but this was needed for this new and complex part.  It will be using GLOBALFOUNDRIES’ new 32 nm SOI/HKMG process, which is being ramped quite aggressively to meet 1H 2011 goals.

The processor core is based on the Phenom II architecture, but with a few twists.  It features no L3 cache, but it increases individual L2 caches to 1 MB a piece.  There are supposed to be a handful of tweaks to the processor as compared to the 45 nm variants.  AMD has yet to really categorize these tweaks, but it probably concerns cache coherency efficiency since there is no shared L3 for the four cores to lean upon.

AMD Llano APU Displayed at GlobalFoundries Technology Conference - Processors 3
GlobalFoundries’ Doug Grose holds the 32nm AMD Llano wafer

The memory controller and northbridge functionality has received a massive makeover due to the increased data needs of an onboard graphics component.  Again, the details of the changes will not be made public until much closer to the release of this product.  This upgrade in the memory controller will likely not have a large effect on CPU performance, but it was certainly necessary to feed the GPU.  It will still be a dual channel, DDR-3 implementation, but with official support for higher DDR-3 speeds.

The graphics portion is still a relative unknown.  Some guesses have been put forward as to its composition, and most of these are based on the somewhat blurry die shot that has been making the rounds for months.  What we do know is that it is a DX11 part.  That is the only official word.

What can be surmised due to the timing of the design phase is that it is based on the current HD 5000 series of graphics parts.  This means that it is most likely not based on the improved “Southern Isles” designs which are going to be released in the near future.  It looks to have around 480 stream units, which is one SIMD (80 stream units) larger than the current Redwood GPU that powers the HD 5500 and HD 5600 parts.  The chip is going to have multiple clock domains, so the CPU core will run into the 3 GHz range while the GPU portions could see anywhere from 750 MHz to 1 GHz (or potentially slightly higher due to the process it is based upon).

For comparative purposes, the current integrated graphics portions of AMD’s class leading chipsets are based upon a single SIMD design (80 stream units) running between 500 MHz and 750 MHz.  These parts are based on TSMC’s 55 nm process, so the new graphics portion of Llano is 1.5 nodes newer.  This has allowed AMD to fit a whole bunch more stream units into an area that was typically reserved for the L3 cache on the Phenom II.

AMD Llano APU Displayed at GlobalFoundries Technology Conference - Processors 4
AMD Llano APU Die Shot

The die size overall on the Llano part should be between 170 mm squared and 210 mm squared, based on rough estimates comparing the die size of the 45 nm Phenom IIs with the full L3 cache.  It should be an improvement for AMD in terms of manufacturing, but it certainly is not as efficient as their native 45 nm dual core design with 1 MB of L2 cache per core or the Athlon II X4s.

With the recent leaks of Sandy Bridge performance, we know that Llano will not be all that competitive on the CPU side.  Certainly it will be an improvement over the current Phenom II parts (just barely), but it will get nowhere near Nehalem or Sandy Bridge performance.  AMD is not terribly worried about that right now, as they feel the graphics performance will be more than enough to overtake what Sandy Bridge brings to integrated GPUs.

This is not to say that Sandy Bridge is a disappointing improvement in integrated graphics.  Intel has really kicked it up a notch, and the low end integrated part (with what Intel calls 6 stream units) doubles performance over the previous Intel Extreme graphics as well as AMD’s 880/890GX integrated parts.  Intel has a 12 unit part that will be integrated into the higher end of the spectrum of Sandy Bridge chips.  It may not double the performance of the integrated graphics, but it will certainly improve it to a significant degree (it may be more memory bandwidth bound).

AMD is hoping that this robust 480 stream unit part will provide near HD 5650 performance.  This would be a very significant boost over what Intel offers, plus solid DX11 support due to the GPU being from a well known graphics architecture.  I do question whether this part will be more bandwidth bound in most situations, and if performance will be limited by that.  The HD 5670 has a very respectable 64 GB/sec of available memory bandwidth.  Compare this to a dual channel DDR-3 1600 controller in Llano that can theoretically do 25.6 GB/sec.  In the real world, AMD’s CPUs have been able to get 13 to 16 GB/sec on current processors.  The graphics portion may help to better utilize that bandwidth, but it will be a tough hill to climb.

What the GPU does promise though is better graphics performance, better quality rendering, and access to stream computing resources once reserved for a standalone GPU.  AMD is betting that in most user related cases, the CPU will be of secondary importance to the GPU.  When dealing with standard desktop applications and gaming, a faster GPU will trump the CPU.  Plus with the growing GPGPU market, it will be a compelling sell for a lot of OEMs and consumers alike.  Also consider that the GPU portion will feature all the AVIVO HD functionality, as well as an assumed ability to bitstream HD audio.

It does appear as though Intel will have Sandy Bridge out well before AMD is able to ship Llano.  But AMD really is hoping that the more advanced graphics portion will offset the lead time, as well as CPU performance advantage that Intel holds.