Review Index:

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale

Author: Steve Grever
Manufacturer: ATI

Test System Configuration and Testing Methodology

Test system configuration

To thoroughly test how these transcoding applications use ATI Stream and CUDA technologies in a consumer-level environment, we were careful to create a mid-range system that consumers could purchase at a relatively affordable cost. We made one change to our test bench that we used in our initial CUDA article a couple weeks ago. For this article, we used a sub $100 AM3 board from Gigabyte called the MA770T-UD3P. This $80 board was so amazing that we will continue to use it in future budget test bench systems because of its overall performance, overclocking capabilities, and build quality. Every other component has remained the same from our previous CUDA review. 


Test system configuration with eVGA 9800+

Test system configuration with Radeon 4770

With this in mind, we put together a moderate AMD AM3-based system with 4GBs of RAM and an NVIDIA 9800GTX+ and ATI 4770 graphics cards. Our eVGA GPU is factory overclocked with a 756MHz core clock and 2246MHz memory clock while the ATI 4770 has a 750MHz core clock and 3200MHz (DDR5) memory clock. The 9800+ utilizes 124 stream processors compared to the 4770's 640 stream processors. The 4770 is a lot newer tech-wise than the 9800+, but they should still have comparable speeds and results as well as comparable prices.

Here’s a complete run-down of the prices and specifications of our AM3 test system (Note: All prices were compiled from on July 12) :

  • CPU: AMD Phenom II X3 720 Black Edition ($119)
  • Motherboard: Gigabyte MA770T-UD3P ($79.99)
  • Video cards: eVGA 9800GTX+  ($124.99 before rebate), ATI Radeon 4770 ($109.99)
  • RAM: OCZ Gold 4GB DDR3 1600 ($74.99)
  • Hard drive: Western Digital 160GB SATA ($56.99)
  • Power supply: PC Power and Cooling 750W ($109.99)
  • Total cost before rebates and shipping:  $568.95 (with 9800+), $553.95 (with 4770)


Other notable items to mention about our test system:

  • Operating system: Windows Vista Ultimate 64-bit
  • NVidia driver version: 186.18 (released June 18)
  • ATI driver version: 9.6 (released June 15)


GPU-Z graphics specifications for the 9800GTX+ and 4770 


Testing methodology

The testing perimeters for evaluating the performance of these ATI Stream and CUDA-enabled transcoding applications were as follows:

  • Evaluate CPU usage and determine how much of the computing load being handled by the CPU with ATI Stream/CUDA enabled and disabled
  • What performance differences will consumers notice between using ATI Stream or CUDA?
  • Subjectively evaluate the image quality of outputted video that was transcoded with ATI Stream and CUDA

After we determined our test perimeters, we also wanted a variety of video formats and sizes to choose from for our benchmarks. We choose everything from MPEG-4 and WMV to MOV and H.264 formats. This gives us a broad range of video formats that should appeal to a variety of consumers.

May 9, 2011 | 06:39 PM - Posted by Joe (not verified)

Please change the tile.

Your article is not about a comparison of Stream and CUDA performance, it is the difference between two software implementations utilising Stream and CUDA.

These technologies allow you to parallelise your algorithms, to imply that one technology performs ,as you essentially say, 'better quality maths' than the other is ignorant.

Please do not misdirect readers like this.


Joe Bloggs

August 30, 2012 | 07:22 AM - Posted by Anonymous (not verified)

Please change you word.

Your comment is not about a reply to the article, it is a quantification of how butthurt you are.

These new breakthrows allow us to see how badly you are spell ,as you essentially try to use 'larger words' but not good at English.

Please do not obfuscate readers' thoughtings like this.


Bloe Joggs

April 18, 2013 | 01:22 AM - Posted by Bloe Bollox (not verified)

damn dude, look at your own english, it's absolutely dreadful!

April 27, 2013 | 11:28 AM - Posted by Anonymous (not verified)

Ya dude, your an idiot, your article is misleading. For sure!

Hater Bater Fuck Face

February 27, 2017 | 03:07 PM - Posted by Anonymous (not verified)


May 22, 2011 | 07:06 AM - Posted by Anonymous (not verified)

You are comparing two cards, one is nearly a year older than the other one, its elementary that the new one is going to win. This review is biased

June 30, 2011 | 03:22 PM - Posted by Anonymous (not verified)

Why are you not comparing the same frame in the outputs? How can you do a comparison of different frames and make a decision on differences in quality?

September 3, 2011 | 06:03 PM - Posted by Rupert Grint (not verified)

My personal gaming research team has found nVIDIA's CUDA technology to be superior, but they compared current GPUs, not GPUs with a manufacturing time gap.

October 14, 2012 | 01:50 AM - Posted by Armand Laroche (not verified)

This is a very interesting article to contribute to my PC Hardware class, as I'm currently in a Network Admin program in Vermont. Please keep up the good work guys I love your site, and you have been very helpful over the last several semesters.

March 21, 2013 | 02:06 PM - Posted by Bitcoin Minner (not verified)

For BitCoin Minners AMD GPUs faster than Nvidia GPUs!

Firstly, AMD designs GPUs with many simple ALUs/shaders (VLIW design) that run at a relatively low frequency clock (typically 1120-3200 ALUs at 625-900 MHz), whereas Nvidia's microarchitecture consists of fewer more complex ALUs and tries to compensate with a higher shader clock (typically 448-1024 ALUs at 1150-1544 MHz). Because of this VLIW vs. non-VLIW difference, Nvidia uses up more square millimeters of die space per ALU, hence can pack fewer of them per chip, and they hit the frequency wall sooner than AMD which prevents them from increasing the clock high enough to match or surpass AMD's performance. This translates to a raw ALU performance advantage for AMD:

An old AMD Radeon HD 6990: 3072 ALUs x 830 MHz = 2550 billion 32-bit instruction per second
A New Nvidia GTX 590: 1024 ALUs x 1214 MHz = 1243 billion 32-bit instruction per second

This approximate 2x-3x performance difference exists across the entire range of AMD and Nvidia GPUs. It is very visible in all ALU-bound GPGPU workloads such as Bitcoin, password bruteforcers, etc.

Secondly, another difference favoring Bitcoin mining on AMD GPUs instead of Nvidia's is that the mining algorithm is based on SHA-256, which makes heavy use of the 32-bit integer right rotate operation. This operation can be implemented as a single hardware instruction on AMD GPUs (BIT_ALIGN_INT), but requires three separate hardware instructions to be emulated on Nvidia GPUs (2 shifts + 1 add). This alone gives AMD another 1.7x performance advantage (~1900 instructions instead of ~3250 to execute the SHA-256 compression function).

Combined together, these 2 factors make AMD GPUs overall 3x-5x faster when mining Bitcoins!

April 3, 2013 | 12:02 AM - Posted by Anonymous (not verified)

Fucking plagerism. Copy/paste from some other source, no citation or credit. Your education should be shredded and flushed down the toilet. Here is where you copied it from for people who want to read from someone with actual knowledge and not just ctrl+c ---> ctrl+v.

April 20, 2013 | 11:58 AM - Posted by Anonymous (not verified)

You plagerized me. I complained about someone else who copied something and posted a link. All you did was change the link. You are a loser and the worst scum on the internet.

August 20, 2013 | 08:09 AM - Posted by Anonymous (not verified)

Why are we bitching about plagiarism? If i wanted to make sure his info was correct i would've looked it up myself. I could care less if it was "plagiarized" as long as the information was correct.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.