Bulldozer Impressions: That was... interesting

Subject: Editorial | October 12, 2011 - 05:45 AM |
Tagged: GLOBALFOUNDRIES, fx-8150, bulldozer, am3+, amd, 32 nm

Huh. 

I am pretty sure I am not the only person who has read these Bulldozer reviews (including Ryan's here at PC Perspective) and had that particular reaction.  Bulldozer was supposed to bulldoze the competition.  It turns out it barely outpaces its own predecessor, the Phenom II X6 1100T.  In fact, in terms of IPC, the older Thuban architecture gives it a sound thrashing when both are clocked at 3.3 GHz.  So why should I be impressed with this processor?

View Full Size

I guess the answer is… you shouldn’t.  At least not yet.  I distinctly remember back in November of 2007 being invited to Lake Tahoe to test and report about the first Phenom samples that were available for limited testing.  We were not allowed to take the samples home with their new AM2+ based motherboards.  When going over the results of the tests with Ryan (I was not part of PCPer at the time) we quickly saw that the 2.6 GHz Phenom was unable to keep up with the Core 2 Q6600 from Intel.  This was a little surprising, as we expected the original Phenom to clean house due to its very forward looking architecture (HT, IMC, beefier FP/SIMD units, etc.).  The original Phenom had its fair share of problems, to say the least.  TDPs were very high, there was the revision B2 bug that was solved in B3, and due to the 65 nm process it did not nearly have as much cache as was needed to make it a more efficient product.

Click to read the rest of this post.

Time passed and we were eventually introduced to the Phenom II products which fixed all of those issues.  AMD finally had a product that could match the high end Core 2 Quad CPUs of the time in nearly every aspect.  Unfortunately for AMD, Intel released the Nehalem/i7 based processors to the market.  Parity was not retained with the new architecture from Intel, and AMD has been scrambling to keep up ever since.

We see a few similarities with the Bulldozer launch, but it does not seem quite so dire.  There are no major bugs like the B2 problem with Phenom.  TDPs are not out of control (though they are not all that great).  Overall performance falls around that of the i5 2500 and they are offered at around the same price point.  There are a lot of interesting aspects to the architecture, and it is quite forward looking.  Unfortunately for AMD, there is a lot more tuning that needs to be done to achieve the potential of this architecture.

View Full Size

One big highlight of this release is that of the large L2 and L3 caches.  In previous generations AMD and GLOBALFOUNDRIES could not shrink the SRAM cell as effectively as Intel could with their process.  With the 32 nm HKMG/SOI process from GF, this is no longer an issue.  In fact, the geometry of the SRAM cells are overall slightly smaller than what Intel can currently achieve with their 32 nm process.  This is why we see a total of 16 MB of caches onboard a fully functional Bulldozer chip.  The L2 caches are clocked at core processor speeds while the L3 cache is clocked at the same speed as the Northbridge (2.2 GHz in this case).  This is a big boost from the previous quad core Phenom II (8 MB total cache with 512 KB per core).  This doubling of onboard cache should be more than adequate to feed the four modules with data to keep the execution units from data starvation.  AMD also did a lot of work on the memory controller and it has a maximum speed of DDR-3 1866.  Bandwidth to main memory should not be an issue with this processor.  The downside is that there is less L1 cache for each module, and each integer unit has a pretty paltry 16KB of L1D cache (Phenom II had 64 KB of L1D per core).  Also add into the equation that per clock latencies for these caches were increased.  While this is somewhat offset by higher core clock speeds, the differences do have a major impact on IPC.

So why exactly is it not performing up to spec?  We are somewhat baffled by it, as the previous “new” product that AMD released garnered rave reviews.  The “Bobcat” core which powers the quick and energy efficient Ontario and Zacate products did everything that was expected of it.  Low power consumption, high performance as compared to competing devices, and a very competent graphics portion all wrapped up into one outstanding product.  Why did Bulldozer fail to impress?

I think there are several reasons for the disappointing performance of this part.  The design is very forward looking and complex.  Data management is likely the overarching reason for the results.  The integer and FP/SIMD units are simply not being utilized to their full potential.  The front end, namely the prefetch, predict, and decode units, are simply not optimized when dealing with the workloads we have tested with.  There are some corner case areas where Bulldozer simply blows away the competition, but these are not common by far.  The smaller L1D cache and the much higher latencies throughout the cache system also will have a deleterious effect on overall processor performance at the clockspeeds we are currently seeing.

View Full Size

Happily AMD is not done with the architecture.  We have now been hearing that AMD is aggressively moving up the “Piledriver” refresh which promises to improve general x86 performance by 10% to 15% per core per clock.  Such a boost in IPC should allow this next product to more adequately compete with Intel and their latest Sandy Bridge parts.  Unfortunately for AMD, it will be Ivy Bridge that will be on the market by the time the desktop Piledriver CPUs hit.

Bulldozer is not a bad product, and it certainly is a big step up from where the original Phenom was at during the time of its release.  It just is not a world beater.  The architecture has a lot of promise, and these performance kinks will be worked out.  Unfortunately, it is going to be a while before we see AMD in a position to leapfrog Intel for the performance crown.  It will not be until GLOBALFOUNDRIES reaches 22 nm in a few years that AMD has another window of opportunity to release a product that could overshadow what Intel has to offer.  Then again, when the 22 nm shift occurs for AMD, Intel will be introducing their 2nd generation 22 nm parts (Haswell).

Some competition in the marketplace is better than none.  This will not sound the death knell of AMD, but it is not going to give AMD a significant boost.  No, the next boost will hopefully come from Trinity, the next generation APU that AMD plans to release in Q1 2012.  Hopefully that particular design will more adequately deliver on the promises of this architecture.
 

Source: AMD
October 12, 2011 | 09:27 AM - Posted by lima (not verified)

I hope you are right, that this architecture is fixable. I hope GF will master their process. I hope that AMD did know about Bulldozer shortcomings before they finished Piledriver. Even with this 10-15% IPC increase it wont be enough to suppress Sandy even less Ivy Bridge. Maybe we will even see higher frequencies with more mature GF's 32nm process.

October 12, 2011 | 09:43 AM - Posted by Josh Walrath

In processor design, the devil is always in the details. The overall workflow looks good, and the changes they made make sense. I think Trinity does fix a lot of the problems we see, and it could very well be that they are stepping away from overarching clock speed increases and working on latencies to improve IPC. I would seriously consider looking at doubling L1D cache per module, halving L2 cache per module, and then improving all latencies to those units. Cutting out a big portion of L2 which runs at full clock speed is going to help overall clocking and TDP. I think such a move would be a positive net gain for the architecture, even though there is less cache overall.

October 12, 2011 | 12:50 PM - Posted by lima (not verified)

Yes, it seems like general agreement is that cache is the one to point at. Just like you said, bigger L1 cache, smaller L2 cache and all 3 with smaller latencies and faster.

October 12, 2011 | 10:20 AM - Posted by codedivine

I think this is like AMD's R600. Big, hot, underperformer but interesting ideas in the architecture. Over time, hopefully we will see tuning of various parameters to give something like the RV670 and RV770.

October 12, 2011 | 10:32 AM - Posted by Josh Walrath

I think the better comparison is Intel's switch from the Northwood core to Prescott.

Northwood went to 3.2 GHz, didn't have terrible thermals, and was a popular and fast chip. Prescott did not add any performance, nearly doubled the transistor count, and had pretty horrific thermals as compared. Sure, Prescott added in 64 bit support and other bells and whistles, but it was never all that demonstrably faster than the previous generation of product on an older process node.

October 12, 2011 | 11:11 AM - Posted by James (not verified)

everyone is harping on the 8150 chip but I think there are a lot of reviewers leaving something out. That 100$ 4 core bulldozer which is testing at 50% - 80% the performance of a 2500k at half the price.

legionhardware.com is the only one who really tested the other offerings. A comparison of the 2100 and 2300 to the 4170 and 6100 may be in order.

October 12, 2011 | 12:53 PM - Posted by lima (not verified)

They just disabled 2 modules/4 cores on 8-core chip, I think they didn't have the actual CPU. Even so Phenom II x4 980 did better on almost all tests for roughly same or little higher price.

October 12, 2011 | 11:58 AM - Posted by nabokovfan87

As a consumer and fan of AMD I am extremely sadened by how poorly the architecture has performed. I thought for sure having the first 8 core machine and 128-bit floating point abilities would be a very big deal. Perhaps with future stepping and revisions it can lead to something.

I think this ultimately has to be written down as a learning experience. They swung for the fences, and I have to assume that preliminary testing indicated these issues. The only place to go from here is to remove the shared FP units or have more of them such that the cores are no longer limited by the wait times.

As a fan of AMD I am a fan because they take chances and actually attempt to do something interesting. They could always use more testing to get some proof behind their visions, but I think I will eventually upgrade some time next year once another revision has been released. Also, the fact that the socket will be the same indicates to me as a customer that I should be able to buy a board now and upgrade later.

October 12, 2011 | 01:38 PM - Posted by AParsh335i (not verified)

This is really reminding me of when Nvidia switched to Fermi. They spent a lot of time and money to do something completley new, rather than just updating something old, and their first line of finished goods are not exactly what we wanted. Take the GTX 465 - It ran hot, and seemed a little expensive for the numbers it put out. Then they came out with the GTX 460 at a great price, that matched the 465's performance with much less heat/energy consumption. Hopefully AMD will be able to recover with a few stars from this new architecture. I am thinking the FX-4170 @ 4.2ghz is going to be a real winner when it comes out, as well as the rumored 4core cpu that has a full blown gaming GPU built in (not a 6450 type, but like a 6770). LCD prices are down, memory prices are down, think if you could get a 24" all-in-one quad core 3ghz+ w/ built in 6770, 8gb ram, 1tb, for about $500. I think that would sell.

October 12, 2011 | 05:11 PM - Posted by drbaltazar (not verified)

the way this processor is made is way ahead of windows thinking,
amd by themselves cant fix this since it is also at ms end.for once amd need to stop a week,find the person at ms responsible for branching caching etc etc etc etc and tell ms how windows need to react in x.y,z situation.dont sweat it unless amd doesnt know what they re saying(lol i doubt it)ms will patch windows so windows doesnt throttle computer to safguard it from imagined threat that are in fact new feature window doesnt understand or cant process.check ssd when the new feature came took a while for ms to support it.it is the same here.

October 13, 2011 | 04:05 PM - Posted by John Doe (not verified)

Yes, blame windows, leave bulldy alone :emo-cry:

I can agree that there is some simple hacks that need to be done in thread scheduler level (eg place 2T on module only when all modules executes at least one thread). But thats it - simple hacks, nothing more. Dont expect windows to somehow magically inspect every thread to determine its nature (is it FP or mybe int, or memory oriented?) and halp lazy bulldozer to do its work. There is no doubt that AMD was thinking how to protect int cores from resource starvation, but looks like they failed. And windows cant affect poor IPC, thats for sure.

October 13, 2011 | 06:28 AM - Posted by Prodeous (not verified)

Additional failure of the architecture is marketing. They made a mistake and called a 4 core with "dual threading capability" as true 8 core architecture. And now everyone compares AMD's "8" core vs 4 core (with HT) on Intel.

Since there is a lot of components shared in the Module, it is better to call AMD's 8 core as 4 core with AHT (AMD/Advanced Hyper Treading). This does not of course change performance in any way, but it does put things into perspective.

Still, as the above article mentions, first version of Athlon/Phenom/R600/Fermi/other... have their birthing issues. When FX issues are addressed, AMD will have something more potent to fight against Intel.

But marketing this as a "8" core was a major mistake by AMD. They bit them selves in the rear with this one.

October 13, 2011 | 03:20 PM - Posted by Anonymous (not verified)

You hit the nail on the head.Many agree that it's not an true 8 core CPU, why leave people wondering how a year old 4 core design can beat your brand spankin' new 8 core?

October 13, 2011 | 07:44 AM - Posted by Anonymous (not verified)

If this is an 8 core CPU then the 990X is a 12 core CPU :P

Marketing....

October 14, 2011 | 01:21 AM - Posted by Anonymous (not verified)

Congratulations Intel. Finally your underhanded tactics is paying off. Thankyou for preventing AMD selling their much too fast Athlons and Opterons. Your lawyers also did a good job so that you only got a slap on the wrist for doing so. The small fine you paid off within a month or two.

Now at last we can sit back and enjoy paying a lot of money for your CPU's. This is great, I think?

So maybe crime does pay? For Intel it did! AMD is such a small company compared to Intel, it is a mirracle that they still do exist and even come up with a product that can in some way be compared to Intel. AMD's APU's are at least the champions!

Maybe one day Intel will reap what they have sown!

October 14, 2011 | 10:10 PM - Posted by Shahrear Russel (not verified)

Yes I agree. This is true. Actually in every piece of comparison I still can not think Intel is best but AMD & is always I hope for. AMD introduced a whole new architecture which is more scalable for future revisions & development.
A big hand for AMD.

October 14, 2011 | 07:50 AM - Posted by drbaltazar (not verified)

i agree 4 dual module kind of mix up user on if it is true 8 core or not!

October 16, 2011 | 07:07 PM - Posted by drbaltazar (not verified)

as someone posted on youtube.the harder this cpu get pushed the better the result.i wonder how long till we get parallele benchmarking. 2bm runnig at the same time on the cpu.so farnone have done it i belive

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.