Review Index:

NVIDIA GT200 Revealed - GeForce GTX 280 and GTX 260 Review

Author: Ryan Shrout
Manufacturer: NVIDIA

NVIDIA GT200 Architecture (cont'd)

Memory Controller Gets an Upgrade

Remember when the G80 was launched as we saw odd frame buffer sizes
on 8800 GTX cards like 768MB or 640MB?  That was because the G80 used a
384-bit memory controller at a time when the Radeon HD 2000/3000 series
cards used up to a 512-bit memory controller. When G92 was introduced
it was developed with a 256-bit memory controller and it turned out to
be a noticeable hindrance to performance – comparisons of similarly
clocked G92 and G80 parts showed the G80 having a big memory

NVIDIA addresses that in the GT200 design with its own 512-bit
memory controller; or more precisely a combination of 8 separate 64-bit
memory controllers.  Each of the 8 memory controllers is connected to a
single block of ROPs as we are accustomed to.  This doesn’t mean we’ll
only be getting 512MB or 1024MB memory configurations though – as we’ll
soon see the GTX 260 actually uses 896MB!

Much has been made recently about AMD’s pre-announcement that their
next-generation part would utilize GDDR5 memory technologies; NVIDIA
was quick to point out that using technology for technology’s sake is a
waste if it does not net you additional performance.  The memory
controller on GT200 can support either GDDR3 or GDDR4 memory but all
the initial boards will be using GDDR3 because NVIDIA doesn’t see the
benefits of GDDR4 from a cost/frequency perspective.  With GDDR3
supplying sufficient clock speed and data rate per pin to mostly
saturate GT200’s memory bus, a move to a solution that is half as wide
but twice as fast doesn’t always save you on transistor budget.  We’ll
have to see how AMD’s technology takes advantage of GDDR5 before really
making our committed analysis. 

Looking at the Chip “As Big As Your Head”

Keeping mind that NVIDIA is building these on 300mm wafers, let’s look at this shot:

This is probably the first time you can actually look at the wafer
shot provided by a company and count, easily, how many GT200s the
company could make pending 100% yield.  The answer by the way is 95.

This die shot highlights a single shader processor and a cluster of 24 with corresponding memory and logic. 

And again, here is the GT200 die with an overlay of all the common
GPU functionality: SPs, texture units, ROPs, memory controllers and
“mystery logic” in the middle that likely includes the VP2 engine, SLI
support and more. 

Power Efficiency Increases

Another one of NVIDIA’s key improvements with the GT200 design come
in the form of power management and efficiency increases.  The new core
design is much more granular in the way it powers down segments not
being used at any given time in order to save on idle and low
processing power consumption.  For example, while the G80 used about 80
watts at idle, the G92 used 45 watts while the new GT200 will use only
25 watts at idle.  Considering the increased size of the chip and
increase in gaming performance, this is an impressive feat.

How is it done?  The GT200 integrates some advanced power saving
features such as improved clock gating and clock and voltage scaling. 
NVIDIA even claimed the ability to turn off components unit-by-unit,
though I am unsure if this means to each stream processor or to each
block of 8 SPs or each block of 24 SPs – my guess in the last option. 
The slope of power can be more finely adjusted with an order of
magnitude more “steps” on the ladder between powering off and full
speed.  As an example of this, NVIDIA’s Tony Tomasi said that for video
decoding the GT200 only has about half as much area powered up than the
G92, even with the larger die size taken into consideration. 

The Hybrid Power technologies that were introduced with the 9800
GTX and the 9800 GX2 are again present in the GT200 series of graphics
cards, but one has to wonder how useful they have become.  If the GT200
cards are only using 25 watts at idle as NVIDIA states, then power that
last 25 watts off shouldn’t be a big a “boost” in power savings
compared to the G92 that used 45+ watts.  Oh well, I guess any power
savings is good power savings at this point. 

GT200 Features

If we look just at marketable features besides the obvious of
“better performance”, the new architecture doesn’t have much to add. 
HybridPower still exists as I just discussed, 2-Way and 3-Way SLI
support continues and the PureVideo 2 engine that was introduced on the
GeForce 9-series is here as well. 

Perhaps the only “new” feature is one we couldn’t test yet: PhysX
support.  Since NVIDIA purchased AGEIA some months ago the promise of
running PhysX on your GeForce GPU has been there.  The status of the
CUDA revision of the PhysX has apparently been going very well – in
just two months of work the team has converted soft bodies, fluid and
cloth to the GPU successfully with just rigid bodies as the last point

As far as PhysX performance is concerned, I asked about a crossover
point where the GPU and PPU (the dedicated PPU hardware that AGEIA sold
in market) performed the same.  The PhysX team didn’t have an exact
answer yet but said it probably fell in line of a mid-range GeForce
8-series card; 8600 GT or so.  What was good to hear was that even with
the penalty in “context switching” a single GT200 card should be able
to render faster than a single 9800 GTX card with dedicated PPU could
have done.  Context switching is the process by which a GPU is forced
to change states, rendering graphical data versus computing physics or
other data; the faster this can occur the less latency the system will
see from the inclusion of PhysX and other simulation add-ons.

The GT200 continues to use an external display chip for digital and
analog outputs (and inputs if any) which was kind of surprising. 
Essentially the GT200 outputs a single stream to the dedicated display
chip that is responsible for branching out connections like HDMI, DVI,
VGA and TV output.  NVIDIA claimed this helped with board design,
making custom designs much more straight forward for third-parties.
This IO chip that NVIDIA is using is also the first to officially
support 10bit digital output – for whenever those accompanying monitors
start showing up. 

One thing you will NOT find on the GT200: official support for the
DX10.1 standard.  This came as quite a shock to us since NVIDIA has had
plenty of time to integrate into their core – AMD has had DX10.1
support in their GPUs since the HD 3800 series was released last year. 
NVIDIA did say that they have quite a bit of UNOFFICIAL support for
DX10.1 features in their GT200 chip but because the DX10 rules state
it’s “all or nothing” for claiming a technology conformity, NVIDIA is
left with only a DX10 architecture part.  They did commit to “working
with developers and ISVs that want to use those deferred rendering
paths” for any titles; of course we DID have a big fallout from
Assassin’s Creed recently that we are curious (but likely to never
will) to know the truth on…

Read more about the GT200 and general purpose parallel computing in our separate article: Moving Away From Just a GPU.