Review Index:
Feedback

AMD Unveils Steamroller Improvements

Author:
Subject: Processors
Manufacturer: AMD

HotChips 2012

 

Ah, the end of August.  School is about to start.  American college football is about to get underway.  Hot Chips is now in full swing.  I guess the end of August caters to all sorts of people.  For the people who are most interested in Hot Chips, the amount of information on next generation CPU architectures is something to really look forward to.  AMD is taking this opportunity to give us a few tantalizing bits of information about their next generation Steamroller core which will be introduced with the codenamed “Kaveri” APU due out in 2013.

View Full Size

AMD is seemingly on the brink of releasing the latest architectural update with Vishera.  This is a Piledriver+ based CPU that will find its way into AM3+ sockets.  On the server side it is expected that the Abu Dhabi processors will also be released in a late September timeframe.  Trinity was the first example of a Piledriver based product, and it showed markedly improved thermals as compared to previous Bulldozer based products, and featured a nice little bump in IPC in both single and multi-threaded applications.  Vishera and Abu Dhabi look to be Piledriver+, which essentially means that there are a few more tweaks in the design that *should* allow it to go faster per clock than Trinity.  There have been a few performance leaks so far, but nothing that has been concrete (or has shown final production-ready silicon).

Until that time when Vishera and its ilk are released, AMD is teasing us with some Steamroller information.  This presentation is featured at Hotchips today (August 28).  It is a very general overview of improvements, but very few details about how AMD is achieving increased performance with this next gen architecture are given.  So with that, I will dive into what information we have.

Click to read the entire article here.

 

Hot Chips 2012: An Introduction to Surround Computing

At Hotchips today we get our first glimpse of what is in store from AMD.  The entire presentation is very general, with the first portion touching upon what AMD considers the “Surround Computing Era”.  In a nutshell surround computing touches upon nearly every aspect of a person’s life having contact with some kind of processing solution.  Keyboards and mice will no longer be the primary method of interaction with the computing world, but rather it will go towards gestures, voice, location recognition, facial recognition, and pattern/behavior anticipation and prediction.  These will be combined with rich graphics and representations of people and environments superimposed over reality.  AMD expects this to become a reality in the next 20 years, and they are tailoring their technology to meet these needs.

View Full Size

The next portion covers what they are expecting to do with HSA.  The primary goal is to make the CPU and GPU equal partners in computing.  To achieve this they must make the transition seamless and transparent to programmers and users.  Instead of relying on products like OpenCL to expose the GPU functionality, AMD is working to make the hardware directly accessible to programmers through high level languages like C, C++, Python, Javascript, and HTML 5.  The GPU portion will have shared virtual memory, coherency, and support context switching natively.  This is not new information, but rather a recap of what we learned at last year’s Fusion Developer’s Summit.  Because AMD is combining the CPU and GPU on one piece of silicon, they have complete control over how these pieces not only communicate with each other, but with the outside world.  This integration will be much tighter in future generations of products.

That is all well and good, but we are far more interested in what is coming up next.  With that, AMD shares with us a very brief overview of what they intend to deliver with Steamroller.

Looking back over the past year we see that Bulldozer is not a bad architecture; it just is not all that great.  If viewed in a vacuum it provides an interesting solution to an increasingly parallel software environment.  The ability to handle many threads effectively without inflating die size to a significant degree is the hallmark of the Bulldozer architecture.  Unfortunately for AMD, there were enough downsides to the design that it was viewed as a failure not just against Intel’s Sandy Bridge and Ivy Bridge processors, but also against the previous generation AMD Phenom II X6 series of products.  The primary issues that we see deal with power consumption and heat, effective thread handling, and lower than expected IPC.

View Full Size

Some of these issues were addressed with the Piledriver update.  The biggest fixes involved power and heat.  Bulldozer was really rushed to market, and as such it was not fully optimized.  To achieve competitive yields and bins, the design was a bit more “loose” than what was aimed for.  More transistors were used than would be necessary if more time had been given in the design process.  But AMD was against a wall, and they needed to get Bulldozer out the door.  Piledriver improves upon IPC, and probably most importantly, the power and clocking issues that Bulldozer suffers from.  Trinity shows us that the design can achieve very good power savings vs. clockspeed, and the small performance bump that it exhibits is very welcome.

August 29, 2012 | 06:22 AM - Posted by Crickets Chirping (not verified)

Barcelona. Bulldozer. ...

Does anyone know if USC plays Notre Dame this year?

September 11, 2012 | 07:39 AM - Posted by Anonymous (not verified)

Uhh...No. Do you know how many pumpkins it takes to make a pie?

August 29, 2012 | 07:27 AM - Posted by Anonymous Coward (not verified)

I do not remembered having ever heard complaints about bulldozer's thermals or clocking. All that stuck in my head was they had their parallel designs done well, which is arguably the harder/more important part to optimize first, but their IPC needed to get trimmed down to make their design really shine.

August 29, 2012 | 08:01 AM - Posted by Josh Walrath

The original design was expected to be a 4 module unit hitting around 4 GHz at 125 watts.  They were never able to get there.  After about 3.8 GHz in that 4 module part, TDPs started to jump really dramatically.  If you remember my FX-6200 review, power at the wall socket jumped up 100 watts when that chip was clocked from 3.8 GHz to 4.6 GHz.  The chips just were never able to hit the clockspeed/TDP targets that would have made them more competitive with not just Intel parts, but also their previous Phenom II products.

September 11, 2012 | 07:36 AM - Posted by Anonymous (not verified)

Staying below TDP is a relative term. 61C is the max Temp at 124watts for a 8150. The processor runs at 3.6, turbo all 8 cores to 3.9 and then 4 cores to 4.2.
However, with no voltage bump, my 8150 runs at 4.2 (turbo Off) and a cool 44.8C under full load, Thats 16C (60F) below TDP. (24/7)Indicating a lot of headroom to TDP
99.9% of the Overclockers use aftermarket cooling, that 4.2ghz is on a tower air cooler.
As far as watts, my video card bumps 100 more watts as well. So how many watts used is not the same as saying "it won't get there within TDP". Obviously it will and within TDP temp of (61C).
So I don't understand your statement. Unless you were referring to some insane Overclock target? AMD's current target is above any intel processor and far below on the cost.
The "4 modules" you refer to is an 8 core, then the FX-62oo you refer to is a 6 core.
I think if you compare $ targets first you see AMD as the leader. If you have more funds than most, you can buy a Intel, IBM or perhaps a Cray.
To clarify: TDP is primarily used as a guideline for manufacturers of thermal solutions (heatsinks/fans, etc)which tells them how much heat their solution should dissipate. TDP is not the maximum power the CPU may generate - there may be periods of time when the CPU dissipates more power than designed, in which case either the CPU temperature will rise closer to the maximum, or special CPU circuitry will activate and add idle cycles or reduce CPU frequency with the intent of reducing the amount of generated power.

TDP is usually 20% - 30% lower than the CPU maximum power dissipation.
In any case I would not expect $200 processor to compete with a $300 to $1000 processor no matter who makes it.

August 29, 2012 | 09:45 PM - Posted by rishidev (not verified)

But where are piledriver CPU's.

August 30, 2012 | 07:49 AM - Posted by Josh Walrath

All the leaked docs say Q4 2012.  So Q4 starts in October.

August 31, 2012 | 07:35 AM - Posted by Anonymous (not verified)

This looks like a piece of wood with some yellow and brown paint on it.

September 11, 2012 | 06:53 AM - Posted by Anonymous (not verified)

I don't know about the watts at the plug but my 8150 runs at 4.2ghz and 44.8C. That is with a tower air cooler.

September 28, 2012 | 10:21 AM - Posted by Anonymous (not verified)

Yeah, mine is running a stable 4.6ghz for going on a month now, it has run at as much of a full load as possible, using Crysis 2 at ultra settings, and Guild Wars 2 and everything maxed, with super-sample enabled (a CPU intense game). My temps never go beyond 52c

September 11, 2012 | 06:54 AM - Posted by Anonymous (not verified)

Sorry, Thats at max load all 8 cores.

October 8, 2012 | 05:19 AM - Posted by Nick (not verified)

My FX -8120 runs at 4.40GHz with all 8-Cores and at full load never breaks past 53C. Piledriver improvements should be better than what Phenom II was to Phenom I. With Steamroller being beyond what we think with a claimed +45% clock for clock performance improvement.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.