More than just a shrink

NVIDIA has been attempt to gain ground back from AMD in the enthusiast PC gaming space since the Radeon 5000-series really put them on their heels. The release of the GTX 480/470/465 cards were one step in the right directions and the release of NVIDIA Surround was another. The new GTX 460 introduces the first new GPU redesign with the GF104 and offers top value for your $199. I think a lot of gamers are going to fall for this card!
The Need for GF104

UPDATE: The new GTX 460 is already for sale at Newegg starting at $199 – check it out after reading our review!!

It is a normal trend for us to see both NVIDIA and ATI take their current generation GPU designs and shrink them down for lower cost and lower performing markets.  For example, we first saw the G84 GPU from NVIDIA (known as the 8600 series of GPUs) just about 5 months after the release of the G80-based GeForce 8800 cards.  The dies are smaller, cheaper to make, and more power efficient, as a general rule of thumb.  The same general trend is seen with the AMD/ATI cards as well but on slightly differing schedules.

This time around, the GF104 design, a remake of the GF100-based Fermi GTX 400-series cards, is much more critical for NVIDIA.  While the GF100 is a powerful GPU it has some critical deficiencies when put head to head with the Radeon 5000 cards from ATI; it is larger, hotter and just generally hard to make.  Because of that we know that NVIDIA’s margins on GF100 are not where they want them to be and the highly competitive market they current sit in makes the financials all that more difficult.

GF104 and the new GeForce GTX 460 are an attempt to address these issues by offering a smaller version of the Fermi architecture from the outset rather than take larger GPUs and disabling portions to sell at a lower cost.  GF104 should be smaller, cheaper, and be more easily produced at much higher yields.  At least, that’s the theory.

Let’s see what the new $199 (768MB) and $229 (1GB) Fermi cards from NVIDIA are made of.

A surprise shift in the new GPU

The new GeForce GTX 460 and the GF104 not only have fewer CUDA processing cores than any other Fermi card out there (336 to be exact) but much more has changed under the hood than that.  As it turns out, NVIDIA has taken this as a chance to reorganize the architecture at a pretty basic level. 

NVIDIA GeForce GTX 460 Review - GF104 and the budget Fermi - Graphics Cards 100

This is the original GF100 GPU that consists of 4 GPC (Graphics Processing Clusters), each with 4 SMs (Simultaneous Multiprocessors), each with 32 CUDA cores.  While we never saw that magical and elusive 512 core part from NVIDIA, the GeForce GTX 480 has a single SM disabled for a total of 480 cores. 

NVIDIA GeForce GTX 460 Review - GF104 and the budget Fermi - Graphics Cards 101

Now here is the GF104 GPU that looks surprisingly different.  There are two GPCs and four SMs per GPC but there some interesting changes included like the move from 32 to 48 CUDA cores with each SM.  Most of the other components scale accordingly including the move to a 256-bit memory bus (though there is a 192-bit option we’ll discuss as well). 

Also worth noting is that just like the GF100, the GF104 is being released with one SM completely disabled – the remaining seven SMs add up to the total 336 CUDA cores. 

NVIDIA GeForce GTX 460 Review - GF104 and the budget Fermi - Graphics Cards 102

Taking a closer look at the SM of the new GF104 we can see some other interesting changes.  The increase in the number of CUDA cores in the SM is balanced by doubling the number of instruction dispatch units to four.  NVIDIA has also doubled the number of texture units for the SM to 8 and this indicates a higher concentration of texture performance than shader performance when compared to the GF100 design. 

Most interesting is the fact that the PolyMorph  Engines are now balanced quite differently with one per 48 SMs rather than one per 32 SMs.  Remember that NVIDIA has been highly touting its advantage in tessellation performance and games that use the technology though this change moves the tessellation performance per CUDA core value down some.  While the GTX 480 saw 30 cores per PolyMorph Engine the new GTX 460 will see 41.5 cores per tessellation engine. 

NVIDIA GeForce GTX 460 Review - GF104 and the budget Fermi - Graphics Cards 103

As you would expect the new GF104 GPU is quite a bit smaller than the original GF100 – and it is an odd rectangular shape based on the dimensions of the heat spreader resting over it.  Also note that we are testing A1 silicon; a notable achievement for anyone familiar with processor design.

NVIDIA GeForce GTX 460 Review - GF104 and the budget Fermi - Graphics Cards 104
For reference, here is a GF100 GPU with the same quarter size comparison

While NVIDIA wouldn’t tell us the exact die size for the new GF104 we do know that it has a transistor count of 1.95 billion compared to the 3.0 billion of GF100; a 35% smaller chip. 

NVIDIA GeForce GTX 460 Review - GF104 and the budget Fermi - Graphics Cards 105

If we look at the rest of the specifications for the GTX 460 / GF104 GPU, a few other notable items step out.  While I mentioned all of the texture unit counts per SM it’s nice to see a total of 56 of them on such a low cost graphics card when the GTX 480 has only slightly more of them with 60.  The ROP count comes in at 24 and 32 depending on the memory buffer configuration and total memory bandwidth comes in at 86.4 GB/s or 115.2 GB/s compared to the 177.4 GB/s of the GF100.  The drop in L2 cache size going from the 256-bit GTX 460 (512KB) to the 192-bit version (384KB) will also affect performance of games as well as in non-gaming applications and I’ll be curious how that pans out for CUDA-enabled programs.

With such a dramatic shuffling of the GF100 architecture it seems obvious that NVIDIA found a couple of things worth changing as they built the GF104 chip for use in the GTX 460.  By reducing the number of PolyMorph Engines per CUDA core in the GPU NVIDIA has lowered the amount of relative tessellation performance in comparison to the GTX 480 and GTX 470.  I doubt they have lowered it too much but NVIDIA obviously thought the tessellation engines were idling a bit too much so by increasing the number of cores for shader processing per PolyMorph the balance should be shifted in the other direction. 

This type of adjustment happens pretty often as GPU companies move from process node to process node or between redesigns like this.  NVIDIA is simply adjusting its estimates for GPU performance and utilization across games, GPGPU functionality, etc, in hopes that they will find better power efficiency in the long run.



« PreviousNext »