ATI Radeon HD 5870 1GB Graphics Card and AMD Eyefinity Review
The Evergreen Architecture in Detail
The Evergreen series of GPUs, though completely new in some ways, is still largely based on the RV770 design that was released last year. And why not? The RV770 was AMD's most successful GPU in recent memory and there was a lot to love in the design from both a performance and efficiency standpoint.
According to AMD, these were the goals set forth by the team when designed the Evergreen series of parts. It is quite a lofty list of requirements including a new API implementation, improving current-generation game performance, doubling raw processing power and then implementing new features like Eyefinity.
Obviously one of AMD's key selling points is not just their performance, but power efficient performance. The Radeon HD 4870 was a huge jump in terms of both performance/watt and performance/mm^2 of die space (which is important for profitability, not so much for gamers exactly) and the HD 5870 is able to keep up the pace. According to AMD's numbers here, the Evergreen part is 73% more efficient in GFLOPS/mm^2 and 92% better in GFLOPS/watt - both very compelling numbers for gamers to take note of.
Now, let's dive into the architecture itself:
AMD calls its new design the "TeraScale 2 Architecture" though we were told by new CTO Demers that he really wanted it called "TeraScale nearly 3" but the legal team wouldn't go for it. The raw numbers are impressive with 2.7+ TFLOPS (TeraFLOPS) of compute power and more than 20 Gigapixels/sec of graphics power. The new design is still similar to that of the RV770 core but AMD has implemented changes to the SIMD layout, stream processing units themselves, the graphics engine, texture units and more.
The new Radeon HD 5870 does increase the die size over the HD 4870 - the new GPU is 334mm^2 while the HD 4870 rested at 263mm^2. Transistor count has more than doubled from 956 million to 2.15 billion! But with that change the GPU gains double the amount of AA resolves, Z/Stencil results, texture units and shader processors over the previous generation, all while adding DX11 support and more.
Based on the diagram below you can see that there are 20 SIMD engines and each of the engines includes 16 thread processors which is based on 5 individual stream cores bringing the total up to 1600 total shader processors.
The graphics engine starts with a pair of rasterizers that effectively are acting as a single solution that both communicate with the same thread dispatch processor. The tessellator continues to support AMD's previous generation commands but is now also programmable via the DX11 hull & domain shaders we went over earlier.
If we dive deeper into one of those individual thread processors we can see how the layout includes 5 distinct stream processors. AMD not only doubled the overall count of these units but improved on the IPC (instructions per clock) by implementing co-issue MULs and dependent ADDs for a single clock as well as DirectX 11 bit-level ops and fused multiply-adds. Four of the stream cores are standard 32-bit/64-bit operators while the special function unit can handle 32-bit FP MAD operations at 1 per clock. The branch prediction unit implemented again in the HD 3800 days remains. All of these add up to the 2.7 TFLOPS of single-precision performance and 544 GFLOPS of double precision performance.
AMD double the texture units along with the shader processors but also increased texture bandwidth up to 68 billion bilinear filter texels/s and 272 billion 32-bit fetches/sec. Not only has the size of the L2 cache been double to 128 kB per memory controller but speeds have been bumped up as well across the board. There are 8 32-bit controllers for a total memory width of 256-bits - the same as the HD 4870 series.
Most users would probably tell you that texture filtering diagrams like this are thing of the past, but until now, AMD has been behind NVIDIA in terms of AF quality. No longer is that the case as the new algorithm that AMD implemented on the HD 5870 hardware completely eliminates angle dependence while maintaining the same performance levels of the previous implementation.
The Render Back-Ends (or ROPs as NVIDIA
calls them) offer improvements of 2x pretty much across the board
including in both 32-bit and 64-bit MSAA samples up to 8x. A new
Supersample AA algorithm actually is anti-aliasing shaders and textures
as well as geometry edges though performance hits on the SSAA
techniques will likely be higher than users care to accept.
AMD is very proud of their AA performance improvements including a significant drop in performance degradation when going from a 4xAA to 8xAA setting. According to AMD's results NVIDIA's GPUs see as much as a 50% speed hit compared to the 18% or so maximum on the new AMD HD 5870 card.
The GDDR5 memory interface is AMD's 5th implementation with this generation of GDDR memory giving them an edge in familiarity if nothing else. This time around AMD has added error detection code (EDC) that does CRC checks on data transfers for improved reliability and higher data memory clock speeds. Also added are GDDR5 memory clock temperature compensations and faster link retraining that all allow for improved memory and power efficiency that is key to getting lower idle power results.
In terms of hardware changes made exclusively for stream computing purposes, AMD had some work done there as well. Obviously they are proud of the extreme single-precision and double-precision FLOP performance as well as the first full implementation of DirectCompute 11 and OpenCL 1.0, but they didn't stop there. AMD added IEEE754-2008 compliant precision to their shaders and pushed to support basically all of the new features coming to OpenCL 1.1 down the road such as 32-bit atomic operations, 64 KB global data shares and more.
The last architectural change we'll note here comes in the form of reduced and more dynamic power consumption. Probably the biggest feat for the team at AMD was a major drop in idle power consumption over the previous generation: going from about 90 watts to just 27 watts! Maximum board power has increased on the HD 5870 by about 17% or so, but with a more than 2x improvement in floating point performance, that power increase seems more than justified.