Review Index:
Feedback

Nehalem Revolution: Intel's Core i7 Processor Complete Review

Author: Ryan Shrout
Subject: Processors
Manufacturer: Intel
Tagged:

Nehalem Architecture (cont'd)

New cache structure, new L3 cache


The
Intel Smart Cache makes a return with the Nehalem core but this time in
a 3-level cache hierarchy design.  The two first level caches include a
32 KB instruction cache and 32 KB of data cache and the L2 cache is a
completely new design compared to the Core 2 CPUs out today.  Each core
receives 256 KB of unified cache that is 8-way associative that is both
low latency (about 10 cycles from load-to-use) and scales well to keep
extra load off the L3 cache.

The L3 cache layer is completely new to Intel
though AMD's Barcelona chip introduced a similar design late in 2007. 
This L3 is an inclusive cache that scales with the number of cores on
the processor - quad core processors will have as much as 8MB in 16-way
associativity.  Any perceived latency on the L3 will depend on the
frequency ratio between the core and uncore sections of the CPU -
something we haven't gotten enough information on yet.

Bring out yer' dead! (front-side bus)

One of the features that Intel HAS been
talking about for a while is the move away from the front-side bus
architecture and to something called Intel's Quick Path Interconnect. 
Previously known only as CSI, common system interface, QuickPath is
Intel's answer to AMD's HyperTransport technology and it performs a
very similar function.


Starting with Nehalem and moving forward Intel's processors will
feature a direct connect architecture that is point to point and will
transmit data from socket to socket as well as from the CPU to the
chipset all while scaling nicely as the number of CPUs and QPI links
goes up.  Part of the reason the QPI technology was needed on Nehalem
was due to the new integrated memory controller on the processor.  As
AMD introduced many years ago, an IMC allows for higher peak memory
bandwidth and lower memory latency though Intel is taking it another
step up by offering a three-channel DDR3 memory controller from each
CPU.  The QPI is also a requirement of efficient chip-to-chip
communications where one CPU might need to access data that is stored
in memory on the other processors memory controller. 

The QPI design supports 6.4 GigaTransfers a second or 12.8 GB/s of
bandwidth in each direction for 25.6 GB/s total bandwidth between two
points.  Future versions of QPI will scale up to faster speeds as
well.  You can also tell in the above four-CPU diagram that QPI will
scale well with as many as four CPUs - each processor in this case
would require four total QPI connections and would be only one hop from
any other CPUs memory. 

An Integrated Memory Controller, with three channels!

The Intel Nehalem Integrated Memory Controller
(IMC) is actually pretty scalable in its own right - besides offering
extreme high bandwidth and low latency the number of memory channels
can be varied, both buffered and non-buffered memories are supported
and memory speeds can be adjusted all based on the market that the
processor will be targeted for. Low cost cores with only dual channel
memory should cost considerably less than top end three-channel
systems. 

At launch, the DDR3 memory controller located on
Nehalem will only OFFICIALLY support DDR3-1066 memory speeds.  While
that is pretty lame, I was told on numerous occasions that the memory
controller will run at speeds of DDR3-1600-2000 but official supports
stops with JEDEC.  The IMC in Nehalem will also force Intel to use the
NUMA (non-uniform memory access standard) since memory will be stored
in different areas (not just attached to the north bridge) for the
first time in Intel's desktop processors.

New Core Power Controls


The
Nehalem core also has a new trick in its bag that enables it to lower
the power consumption of a core to nearly 0 watts - something that
wasn't possible on previous designs.  You can see in the image above
what the total power consumption of a core was typically made up of
with the Core 2 series of processors - clocks and logic are the
majority of it yes, but a third or more is related to leakage of the
transistors and was something that couldn't be turned off in prior
designs.


How
is this changed with Nehalem?  Well with the independent power
controller in the PCU and the different power planes that each core
rests on, the power consumption for each core is completely independent
from the others.  You can see in this diagram that though Core 3 is
loaded the entire time, both Core 2 and Core 0 are able to power down
to practically 0 watts when their work load is complete.

Turbo Mode: free performance?

Perhaps the most interesting bit of news
out of Intel's Nehalem was something called Turbo Mode - a feature
directly enabled by the PCU we discussed on the previous page.  With
modern processors, the debate has raged whether users are better off
getting a quad-core CPU at a lower frequency or a dual-core CPU at a
higher frequency.  Intel is hoping that with Turbo Mode users will get
the best of both worlds. 


The
idea is pretty straight forward: if you have four cores that run at
combined power consumption (and heat dissipation) of X, then if you
only have two cores loaded (with the other two at idle) then you have
additional power headroom to overclock the working cores to a higher
frequency. 

For enthusiasts and gamers this should been an exciting turn of
events.  While Intel wasn't very specific at this point I imagine we'll
see ranges of 200-300 MHz going from the full quad-core clock rate to
the a dual-core or single-core (based on idle cores at the time.  This
means if you purchase a 3.2 GHz Core i7 Nehalem based processor, you
will likely see clock rates as high as 3.5 GHz when running single
threaded or just dual threaded applications.  Gamers should also take
note of this!

Intel claims that with the power of the PCU inside the chip the Nehalem
core is aware of its surroundings and conditions.  If your system is
running very cool, say you have water cooling for example, the chip
will recognize that it is well under its own TDP and push the clocks
even faster.  This is possible even while loading all four cores as the
above diagram shows.  The on-board micro-controller tunes voltages
based around a given frequency, operating conditions and specific
silicon characteristics.  In some ways it appears that the Nehalem core
will be able self award enough to find out how far it can be pushed
without burning up.

February 7, 2012 | 11:30 AM - Posted by jean betancourt (not verified)

this very interesting information I've learned so much from intel over the years i been studying computers alot its ny passion and i will continue to full fill my dreams

February 22, 2012 | 08:50 AM - Posted by Anonymous (not verified)

Grammar and punctuation are important things to know too.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.