Review Index:
Feedback

AMD Phenom 9600 and 9900 Review: Barcelona on the Desktop

Author:
Subject: Processors
Manufacturer: AMD
Tagged:

The Phenom CPU Architecture

Introduction

Today marks one of the most pivotal days in AMD's recent history -- the launch of their new desktop core architecture to replace the ever-lasting and ever-loved Athlon 64 core.  Since 2003, the on-die memory controller and 64-bit addressing have made AMD a dominant player in the CPU world and not just a footnote in Intel's resume.  Recent months have been hard on the Athlon 64 though as Intel's Core 2 series of processors are powerful and power efficient leaving the aging architecture from AMD's past behind.

AMD is hoping that Barcelona, aka Agena, aka Phenom will help push them back into the performance fold.



Introducing AMD's Agena / Barcelona Architecture

The architecture of the new AMD Phenom processor is based on the Barcelona core, otherwise known as the Agena core for the desktop market.  We have taken a couple of looks at the Barcelona technology before, but for the final release of the desktop Phenom processor we'll touch on the detailed changes in the designs with more detail. 



Our first slide from AMD during one of our various meetings in the past months shows their answer to the "tick-tock" methodology of Intel in their processor design.  We are now seeing the implementation of the 65nm version of Phenom -- an x86 quad-core processor with an internal core design known as "Stars".  Next year at some point we'll see the 45nm shrink of the Barcelona cores and then in 2009 the first Fusion processor is supposed to be ready.  Fusion will be the culmination of the ATI acquisition as both CPU and GPU are placed on a single die.  But that is several years away, so let's focus on today's release. 

Today's Phenom processor is based on the 65nm process technology as shown above and features just about 450 million transistors and a die size of about 288 mm^2.  Compared to the Intel Yorkfield, which has over 800 million transistors, the Phenom core looks like a light weight though when you take the 45nm process that Intel uses into account, Yorkfield actually has a smaller die of 214 mm^2. 

AMD's new Phenom processor has six key features that AMD feels give it an advantage over the competing products from Intel.  These "star points" are summed up in the slide above and include the usual suspects like use of HyperTransport and "true" quad-core design as well as a new L3 cache and other core changes.
The new Phenom processors are the first desktop CPUs to use the new HyperTransport 3.0 protocol for I/O.  Long ago AMD left the front-side bus behind and helped develop a new data bus known as HyperTransport that alleviates much of the bottleneck that the traditional FSB instills.  Even Intel has finally come to the conclusion that the FSB is dead and will be implementing their own interconnect, similar to HT, in their next-generation Nehalem core CPUs.

HT 3.0 provides "up to" 20.8 GB/s of raw bandwidth if the frequency and bus width are topped out; previous HT 1.0 and 2.0 revisions had 6.4 GB/s and 8.0 GB/s of raw bandwidth.  The new HT 3.0 bus on the Phenom processors will run at default clock rate of 3.6 GHz, compared to the 2.0 GHz that the HT 2.0 on current Athlon AM2 processors though the Phenom CPUs can work in motherboards with HT 1.0 and HT 2.0 as the interconnect specs are backwards compatible. 

Two other key features of the Phenom processor include the integrated memory controller and the addition of a third level of cache.  The integrated memory controller is a feature that has been AMD's pride and joy since its introduction in mid-2003.  It has given them an advantage over Intel's processors in terms of memory bandwidth and latency and kept them in the performance game beyond when the core itself would have been able.

The new memory controller hasn't been changed much over the Athlon X2 CPUs -- it still runs DDR2 memory officially at 800 MHz but more or less unofficially at 1066 MHz.  AMD has already stated that DDR3 memory will probably be coming in late 2008 or early 2009, when the pricing of it matches parity with DDR2. 

Phenom processors actually have TWO memory controllers on them -- both of which are 64-bit.  They can work in tandem (known as ganged) with matched DIMMs and thus provide a full 128-bit memory access.  Where the DIMMs do not match, or when the user wants to adjust a BIOS setting, they can work independently to provide standard dual-channel 64-bit memory accesses.  The performance variances of these two settings is still something that is questionable, because of reasons we'll explain in a bit.


The new L3 cache is a shared cache used by the four cores of the Phenom processor to communicate with each other as well as to reduce the amount of time required for dynamic memory access.  The cache is also used to buffer data being written to memory in order to reduce the frequency of writes to system memory.  This should in turn improve the performance of memory reads, which are typically more important for system speed. 

A new iteration of Cool'n'Quiet for the Phenom processors allows AMD to do some interesting things with these four processing cores.  Each core can run at individual frequencies as well as individual voltages allowing the logic on the CPU itself to determine how much performance is needed and thus how much power is consumed. 

The Phenom also implements the same C1E power state that Intel's Core 2 Duo processor use for power savings; important because Windows and other operating systems utilize it for more power efficiency.  AMD's chips can use it when all four cores are idle and then physically disconnect the HyperTransport links, put the memory modules in a lower power mode and lower internal clocks as well. 


We should note of course that even though all of these new power features are great for users of motherboards and chipsets that support them, the Phenom chips are also backwards compatible with parallel VID control used in the current generation of AM2 motherboards. 

There are other performance enhancements to the Barcelona core as well including wider data paths from the memory controller to the cores and 128-bit wide floating point units.  These 128-bit units can be used to execute pairs of 64-bit SSE instructions, for example, as a single instruction. 

A new memory prefetcher, enhanced branch predictor, and virtualization improvements for nested pages are just a handful of other changes that have been made.  Due to some time constraints, we'll leave these for another time to really dive into the architecture -- we know you really want to see performance anyway.
No comments posted yet.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.