NVIDIA GT200: Moving Away from Just a GPU
Transcode and Fold
The final big change was the inclusion of stream units which support double precision floating point values. Unfortunately, NVIDIA was unable to provide this functionality to the 240 stream processors, but each SM incorporates a single double precision, 64 bit unit. In NVIDIA’s white paper they go on to explain this functionality:
“A very important new addition to the GeForce GTX 200 GPU architecture is double-precision, 64-bit floating point computation support. This benefits various high-end scientific, engineering, and financial computing applications or any computational task requiring very high accuracy of results. Each SM incorporates a double-precision 64-bit floating math unit, for a total of 30 double-precision 64-bit processing cores.
The double-precision unit performs a fused MAD, which is a high-precision implementation of a MAD instruction that is also fully IEEE 754R floating-point specification compliant. The overall double-precision performance of all 10 TPCs of a GeForce GTX 200 GPU is roughly equivalent to an eight-core Xeon CPU, yielding up to 90 gigaflops.”
This is not what many were expecting, but at least NVIDIA does provide the functionality. In the quad GPU boxes that will be branded under the Tesla name, it will have the ability to have phenomenal single precision math performance, as well as matching a 32 processor Xeon box.
Stream Computing Applications
While stream computing on the GPU is still relatively new, and applications are in development for it, it still has not exactly hit prime time. Two new applications are going to be released in short order which should help facilitate the greater adoption of doing general computing work on the GPU.
For testing I used a Core 2 Extreme X6800 processor (2.93 GHz) on a XFX 680i motherboard. The system features 4 GB of DDR-2 800 memory, Windows Vista 64, and the 177.34 drivers. This is not exactly a pokey machine, but the results when using the GPU as a computing platform are impressive to say the least.
Folding@Home is a popular distributed program, and unlike some distributed programs it is actually working toward a goal which should directly effect the quality of our lives.
The first application is not exactly new to the GPU scene, but it is the first time that NVIDIA has participated in it. Folding@home is a popular and scientifically important distributed computing program which simulates the folding of complex proteins which should help us understand the molecular nature of many diseases such as Parkinson’s, Alzheimer’s, and cancer. The previous GPU and GPU2 clients have support ATI/AMD cards in the past, and it has shown the potential of GPU computing. Now NVIDIA is joining the fray with their own optimized client which will allow the GPU to run varied workloads.
Since the work units (WU) in F@H are not created equal in computing resources needed to finish the problem, the folks at Stanford award points for the work done in terms of how the units are weighted. For example, a new Core 2 Quad QX9770 (which is a $1200 chip) puts out around 1100 points in a 24 hour period using the SMP F@H client utilizing all cores. In my testing of the new client and the GTX 280 video card, I was able to achieve 58 WUs in a 24 hour period, generating approximately 5800 points. We can see right off the bat that it would take around five quad core C2Q processors running over 3 GHz to match the output in terms of points that the GTX 280 is able to do with the new NVIDIA GPU client.
The second program which leverages the power of NVIDIA’s latest GPU is that of BadaBoom. This transcoder takes video and converts it for use in a variety of playback devices. It heavily leverages the GPU power of the new GTX 200 series. For comparison I used another program called Handbrake, which is frequently used as a transcoder. I took a film clip and transcoded it to approximately the same end product (in this case an H.264 MP4 file with approximately a 2.1 mbps bit-rate).
BadaBOOM still needs a bit of work, but it does work. Very fast. Transcodes happen in a flash.
BadaBoom does the majority of the work on the GPU, while Handbrake is multi-thread aware and will utilize both cores of the C2D X6800 processor used in the test. BadaBoom was able to transcode the source file at an average of 154 fps. Handbrake on the other hand was hitting around 32 fps. BadaBoom completed in 28 seconds while Handbrake finished its job in 2 minutes and 20 seconds. The resulting files were nearly identical in quality and bit-rate, but the GPU accelerated client was almost 5 times faster than the already quick Handbrake program running on a C2D X6800 processor.
The Core 2 Extreme X6800 is a very fast processor, and with a client which can utilize both cores, it performs pretty well. But 5 times slower than the GTX 280 while encoding at comparable quality.
More applications are coming out which will utilize the GPU for certain workloads. The most famous of which is the upcoming Photoshop CS4. This type of application will help open the doors for a wider acceptance of GPU accelerated applications.
While NVIDIA may not have blown the doors off of GPU computing with this launch, they are certainly paving the way to more general acceptance. Their development of CUDA as well as the release of this new GPU will help ease the transition from traditional CPU high performance computing to more stream processing applications in both consumer and scientific circles. I am sure we are going to hear much more from the Tesla group very soon.
NVIDIA is making a giant push to bring stream processing to the masses, and they already have a 70 million unit installed base when considering the GeForce 8 series of products. The GTX 280 will further their cause by providing more performance and greater functionality. If NVIDIA and AMD both succeed in their efforts to mainstream GPU computing, then they will certainly give Intel a run for its money in HPC circles and daily life.
Be sure to read Part 1 to our two-part series on the GT200: a review of the GTX 280 and GTX 260 for gaming!
Be sure to use our pricing engine to find the best prices on NVIDIA and AMD graphics cards and anything else you might need: