NVIDIA GF100 Architecture Preview - Fermi brings DX11 to the desktop
A Bridge Too Far or Making Sure it is Done Right?
It could very well be a solid 7 months from when the HD 5870 was released until NVIDIA ships its first DX11 part. There are a lot of people prognosticating the doom of NVIDIA because they were so late to the market, but sending condolences to NVIDIA is a bit premature. While NVIDIA has stated that it is unhappy not to be competing in the DX11 market at this time, the engineering resources required for what they felt needed to happen were just not enough to get the part out around the same time that AMD did.
This is not to take away from AMD’s Evergreen architecture, as it is a very good architecture and it certainly performs well in what DX11 applications we see so far. But again, not as many major architectural shifts occurred in the AMD chip as compared to what NVIDIA is implementing with GF100. There were several areas where NVIDIA saw that some serious work needed to be done, and they went about correcting that.
The Fermi architecture embraces a new way to address memory as compared to previous generations. Significant improvements in cache heirarchy combined with far more efficient main memory accesses takes performance a huge step forward. It is amazing how much more CPU looking the GPU is becoming.
In addressing these issues NVIDIA appears to have engineered one of the largest ASICs of all time. Coming in at 3 billion transistors, the GF100 is a monster of a chip. It will also take up approximately 500+ mm squared of die space, which is another concern when dealing with the problems that TSMC has had in the past year with ramping their 40 nm process. To make matters worse, NVIDIA had to redesign and re-layout their CUDA cores, which are fully custom layout portions of the chip. This takes a lot of time, and is very manpower intensive. When we put all these factors together, it is no wonder why the chip is so late.
The big question this all brings up is if the GF100 is far too ambitious for current design tools and process tech? Hence the “A Bridge Too Far” header. It is a complex task that NVIDIA has set for itself. I honestly thought that they had bitten off too much this time, and the GF100 would not be able to adequately compete with the HD 5000 series. Because it was designed to be a “jack of all trades” it might not improve upon gaming enough over the competition, and still be able to be economical to produce. The fear was that the chip would be a couple of percent faster and add a few features, but be 50% more expensive to produce from a chip and board standpoint. This would leave NVIDIA and its partners in the unenviable position of having to deal with far lower margins than the competition because the performance would require it to be priced similarly.
A closer look at the Stream Microprocessor, which we will be delving into on the next page. Is it me or does this somewhat resemble some of Intel's experiments in massively parallel processors?
The primary issues that NVIDIA wanted to address with this design are as follows; image quality, geometry performance, effective tessellation performance, raw pixel pushing power for multi-monitor and 3D vision performance, and a huge leap in GPGPU capabilities and performance. Unfortunately for NVIDIA, this required a fairly major rethinking of how their architecture is laid out, how information is passed between the functional units, and how many more transistors that their goals would require to be used. Previously we have detailed what NVIDIA will do for GPGPU in Ryan’s Fermi preview, but that is only the tip of the iceberg when we look at the chip as a whole, from both graphics and GPGPU perspectives.