Bulldozer First Release and the State of 32nm AMD Parts
Bulldozer Ships for Revenue
Some months back we covered the news that AMD had released its first revenue shipments of Llano. This was a big deal back then, as it was the first 32 nm based product from AMD, and one which could help AMD achieve power and performance parity with Intel in a number of platforms. Llano has gone on to be a decent seller for AMD, and it has had a positive effect on AMD’s marketshare in laptops. Where once AMD was a distant second in overall terms of power and performance in the mobile environment, Llano now allows them to get close to the CPU performance of the Intel processors, achieve much greater performance in graphics workloads, and has matched Intel in overall power consumption.
KY Wong and Marshall Kwait hand off the first box of Bulldozer based Interlagos processors to Cray's Joe Fitzgerald. Photo courtesy of AMD.
Some five months later we are now making the same type of announcement for AMD and their first revenue shipment of the Bulldozer core. The first chips off the line are actually “Interlagos” chips; basically server processors that feature upwards of 16 cores (8 modules, each module containing two integer units and then the shared 256 bit FPU/SSE SIMD unit). The first customer is Cray, purveyor of fine supercomputers everywhere. They will be integrating these new chips into their Cray XE6 supercomputers, which have been purchased by a handful of governmental and education entities around the world.
This summer has been ablaze with rumors about Bulldozer, its delays, and AMD’s seeming inability to actually get this very important part out the door. Bulldozer is the future of AMD, and they had better get it right in pretty short order. As we saw with the original Phenom, they can struggle through some teething issues and still come out intact (though battered and bruised). The rumors of lower than expected performance, issues with TDP and certain steppings, as well as GLOBALFOUNDRIES’ low yields have all cast a grim shadow over Bulldozer.
It looks as if AMD is shipping B2 stepping silicon, and from all indications that will be the stepping that will be shipped for desktop parts. The first B0 engineering samples had some significant problems clocking above 2.8 GHz, and in benchmarks would only match the 3.3 GHz (3.6 GHz Turbo Core) Phenom II X6 1100T. This of course concerned AMD and their motherboard partners a great deal. B1 silicon helped that issue some, but still did not give enough headroom to make the push to get into production. B2 is apparently the stepping that is good enough to actually start production in good quantities. This is not to say that AMD and GLOBALFOUNDRIES are not working on the design and another major stepping to improve the chip.
The Jump to 32 nm
This has not been an easy transition for AMD, or the foundry industry in general, going from 40/45 nm down to 28/32 nm. Intel was able to accomplish this jump nearly 20 months ago. GF (GLOBALFOUNDRIES) only started shipping revenue producing silicon based on 32 nm in March of this year. So we figure they are at least 15 months behind Intel, but that might be stretching it. Intel did not seem to have the same yield issues that GF does with their current process, but we are also talking a bit of apples and oranges here when comparing/contrasting those two 32 nm processes. Intel does not use SOI, but they do utilize HKMG. Still, when a company has a few extra billion dollars floating around, they can afford to buy as much expertise as they want when it comes to process engineers.
The graphics portion of Llano takes up easily 1/3 of the die. It is not hard to speculate why AMD is shipping multiple SKUs of Llano with both CPU cores and GPU SIMD units disabled when yields and bins of this product are not where they should be.
Llano has had a bit of a bumpy ride. Neither AMD nor GF are very happy about the current yields, and there are grumblings from both parties about where exactly the problem is. AMD is said to be grousing about the overall state of GF’s 32 nm process, while GF has some negative things to say about the graphics portion of Llano somehow messing up the works. Likely the truth is somewhere in between, and I am guessing there is plenty of blame for both parties. Llano is the first 32 nm part that GF has produced in mass quantities. Llano also contains the first Radeon based graphics core to utilize the more CPU specific (read “custom cell”) 32 nm HKMG/SiGe/SOI process node.
Yields are rumored to be around the 50% mark, but neither company will give an exact number. Some analysts say it is higher, some lower, but 50% is a good place to start. Demand for Llano is good. It is gaining a lot of traction in the mobile market, and is turning into a popular processor for budget and midrange systems. AMD has shipped over 10 million of those CPUs so far. That is not nearly enough to meet demand. While you can buy these processors fairly easily, I am hearing from motherboard partners that they were expecting far more processors to be on the market and in multiple SKUs from major OEMs. But when AMD cannot promise the availability of a significant order from a major OEM month in and month out, those OEMs tend to cut back on their orders rather than risk shortages or having their lines idle while waiting for CPUs. This in turn has left motherboard manufacturers holding onto a lot of A75 and A55 based motherboards that are looking for homes.
The Bulldozer Factor
The above mentioned issues are likely all factors as well why Bulldozer is as late as it is. GF only has so many 32 nm wafer starts a month at their Fabs in Dresden. They also currently produce AMD’s 45 nm parts that are still being sold. When yields are low, then more wafer starts are needed to meet demand. Demand still far exceeds actual supply, so it is a tough decision to take a significant portion of those wafer starts and apply them to another product that will not be available in good quantities for some months.
The die shot of Bulldozer from last year. 16 MB of cache and 8 cores are waiting to greet you later this fall.
The general consensus is that the Bulldozer architecture is not as fast as the Intel Core series in terms of IPC. Where AMD makes up ground is that the designs can be clocked higher and still achieve the same TDP range as their counterparts from Intel. So for example at 125 watts TDP Intel has a quad core i5/i7 that runs upwards of 3.3 GHz, AMD will have a 125 watt TDP 8 core/4 module Bulldozer running at 3.6 GHz base clock. Things get even muddier when we start considering turbo speeds and hyperthreading, but overall it looks like AMD at least has a competitive part.
There are many reasons why we always see new processors being released on the server side. First off it is a higher margin market, so the first batches will offset their low yields with higher overall prices. If AMD were to try to introduce these onto the desktop first, they would be charging $200 to $300 per chip, while the same silicon on the server side would be fetching $450 and above (or thereabouts). The second is that the first production runs will likely have lower speed and thermal bins, and so they are more apt to be used in products that have a lower overall clockspeed, but can still be sold for a nice margin. Finally what we are seeing here are multiple, lower clocked dies on a single substrate. It could very well be that these parts cannot clock very high and still maintain a decent TDP, but if they are clocked low enough, that TDP drops dramatically, and two can be placed on one chip. Again, these can be sold at a much higher price.
AMD also showed off “Trinity” this summer, which consists of the Piledriver CPU core (based off Bulldozer, but optimized) and a redesigned GPU portion based off of the Radeon HD 6900 series parts, namely a VLIW4 architecture. The chip was running, and apparently running very well, when AMD showed it off. Initial roadmaps hinted that this part would be a late 1H 2012 CPU, or perhaps Q3 2012 at the latest. All signs point to this part being introduced much, much sooner. Think January/February 2012. In between a better understood design, improvements on the 32 nm process, and some good luck AMD is trying to push this product out the door ASAP. This should give them a very compelling mobile product that may very well compete adequately with Intel’s 22 nm parts that should be introduced around Q2 of 2012. AMD has bragged that these chips should see TDPs in the sub 20 watt range and scale upwards to 65 watts and above.
The first revenue shipments of Bulldozer are a big step for AMD. This in turn should lead to further introductions down the road, most notably on the desktop sometime in the next two months. While Cray was the first to get hands on product, we do not expect general availability in the server realm until December of this year. Yields for this architecture *should* bet better than what we are currently seeing for Llano, as these parts do not feature the very different GPU portion that has proven to be troublesome for this particular process. But Bulldozer is still a brand new architecture being produced on a brand new process. Do not expect miracles with yields, but as with any other product pushed to production, we will see improvement over time. Now we just have to see if AMD can reclaim some of their mojo and be able to claw back some of their lost marketshare.