Review Index:

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale

Manufacturer: ATI

Introduction, GPGPU history, ATI Stream and CUDA overviews

It's a bit late to the party, but can ATI Stream bring the heat against a refined CUDA technology?

Since our initial review of five of NVidia's CUDA-enabled applications back in June, we've been chomping at the bit to get our first real look at ATI's entry into the GPU computing ring called ATI Stream. Both of these platforms use parallel computing architectures to utilize the GPU's stream processors, in tandem with the CPU, to significantly increase any system's video transcoding speeds.

Today, we are going to discuss both of these technologies as well as benchmark a couple video transcoding applications from Cyberlink that support both CUDA and ATI Stream platforms. We will also take a  brief look at ATI's Avivo video converter to see what ATI's own free software has to offer.


GPGPU history at a glance

Video Equipment with Ikonas graphics system (Courtesy photo)

The first General-Purpose Graphics Processing Unit or GPGPU was initially created in 1978 when Ikonas developed a programmable raster display system for cockpit instrumentation. Before 2006, there were only a handful of other systems that incorporated GPGPU technology.

Former CEO Dave Orton explains ATI's Stream computing initiative at a press event in 2006 (Courtesy of

In November 2006, AMD's website stated they started the "GPGPU revolution" with the introduction of "Close To Metal", the first iteration of their GPGPU technology that has now evolved into ATI Stream. But, after several missteps and delays, they weren't actually able to fully utilize ATI Stream technology until their December 2008 launch of the ATI Catalyst 8.12 driver, which officially brought Stream to the masses.

To give consumers a glipse of this new technology, AMD reconfigured ATI's free Avivo Video Converter to be Stream-compatible. Since it's re-release in 2008, only two video transcoding applications have incorporated ATI Stream into its programming -- Cyberlink's PowerDirector and MediaShow Expresso applications. There are other applications in various stages of development, but nothing else available on the market currently.

NVidia CEO and president Jen-Hsun Huang plays with a game using NVidia's Physx technology for gaming, at the International Consumer Electronics Show in Las Vegas Jan. 8, 2009. (Courtesy photo)

On the other side of the fence, NVidia's Compute Unified Device Architecture or "CUDA" platform was announced together with G80 in November 2006. A public beta version of the CUDA SDK was released in February 2007. The first version of CUDA rolled out with Tesla in June 2007, which was based on G80 and designed for high performance computing. At the end of 2007, NVidia released CUDA 1.1 beta, which added new features but was a minor release. Since it's initial release, CUDA has been used and featured in seven retail video transcoding applications.

The development of GPGPUs is truly about fully utilizing all the processing potential that lies dormant in graphics cards when users aren't playing Crysis or Far Cry 2. GPGPUs will allow users to see what will happen if other applications are able to make use of the stream processors in a graphics card. This is why NVidia and AMD are frantically working to harness the GPGPU potential of their respective graphics hardware.


Why is GPGPU technology important?

The importance of the emergence of GPGPU technology is simple -- it will increase the speed of many types of tasks consumers do every day by using the GPU and the CPU in tandem for "general purpose" computations (or number crunching) that was once only handled by the CPU alone. When this technology fully matures, consumers will see noticable performance increases when they convert audio and video files, play graphics-intensive games, and in other daily tasks. ATI Stream and CUDA focuses on using the GPU's stream processors in tandem with the CPU to enable the entire system to handle computing-intensive applications, and more specifically video transcoding applications.

ATI Stream technology overview

(Courtesy of ATI)

Ryan first wrote about ATI's new Stream technology back in November 2008, and since that time the basic premise behind the technology still stands. ATI Stream technology is based off a set of advanced hardware and software technologies that enable AMD graphics processors, working in concert with the system’s central processor, to accelerate many applications beyond just graphics. Stream technology enables hundreds of parallel Stream cores inside AMD graphics processors to accelerate general purpose applications. These capabilities will allow ATI Stream-enabled programs to operate with optimized performance or with new functionality.

(Courtesy of ATI)

ATI Stream uses parallel computing architecture that will take advantage of thegraphics card's stream processors to compute problems, applications or tasks that can be broken down into parallel, identical operations and run simultaneously on a single processor device. Stream computing also takes advantage of a SIMD methodology (single instruction, multiple data) whereas a CPU is a modified SISD methodology (single instruction, single data); modifications taking various parallelism techniques into account.


NVidia CUDA technology overview

(Courtesy of NVidia)

NVidia CUDA is a general purpose parallel computing architecture that leverages the parallel compute engine in NVidia graphics processing units to solve many complex computational problems in a fraction of the time required on a CPU. It includes the CUDA Instruction Set Architecture and the parallel compute engine in the GPU. No GPU parallel computing architecture has been more in the spotlight than NVidia’s CUDA either. CUDA performs two major functions that consumers should be aware of – it helps reduce or match CPU usage by engaging the GPU’s stream processors and it can accelerate any computing process where CUDA is enabled.

(Courtesy of NVidia)

NVidia claims to have sold more than 100 million CUDA-enabled GPUs to date which is probably accurate, and they are also being supported by thousands of software developers who NVidia says are already using the free CUDA software development tools to solve problems in a variety of professional and home applications.

Now that you have better insight into the history behind GPGPU technology as well as ATI and NVidia's role in the technology's development, let's move on to the ATI Stream and CUDA-enabled video transcoding applications we will be using for our review today.

May 9, 2011 | 06:39 PM - Posted by Joe (not verified)

Please change the tile.

Your article is not about a comparison of Stream and CUDA performance, it is the difference between two software implementations utilising Stream and CUDA.

These technologies allow you to parallelise your algorithms, to imply that one technology performs ,as you essentially say, 'better quality maths' than the other is ignorant.

Please do not misdirect readers like this.


Joe Bloggs

August 30, 2012 | 07:22 AM - Posted by Anonymous (not verified)

Please change you word.

Your comment is not about a reply to the article, it is a quantification of how butthurt you are.

These new breakthrows allow us to see how badly you are spell ,as you essentially try to use 'larger words' but not good at English.

Please do not obfuscate readers' thoughtings like this.


Bloe Joggs

April 18, 2013 | 01:22 AM - Posted by Bloe Bollox (not verified)

damn dude, look at your own english, it's absolutely dreadful!

April 27, 2013 | 11:28 AM - Posted by Anonymous (not verified)

Ya dude, your an idiot, your article is misleading. For sure!

Hater Bater Fuck Face

February 27, 2017 | 03:07 PM - Posted by Anonymous (not verified)


May 22, 2011 | 07:06 AM - Posted by Anonymous (not verified)

You are comparing two cards, one is nearly a year older than the other one, its elementary that the new one is going to win. This review is biased

June 30, 2011 | 03:22 PM - Posted by Anonymous (not verified)

Why are you not comparing the same frame in the outputs? How can you do a comparison of different frames and make a decision on differences in quality?

September 3, 2011 | 06:03 PM - Posted by Rupert Grint (not verified)

My personal gaming research team has found nVIDIA's CUDA technology to be superior, but they compared current GPUs, not GPUs with a manufacturing time gap.

October 14, 2012 | 01:50 AM - Posted by Armand Laroche (not verified)

This is a very interesting article to contribute to my PC Hardware class, as I'm currently in a Network Admin program in Vermont. Please keep up the good work guys I love your site, and you have been very helpful over the last several semesters.

March 21, 2013 | 02:06 PM - Posted by Bitcoin Minner (not verified)

For BitCoin Minners AMD GPUs faster than Nvidia GPUs!

Firstly, AMD designs GPUs with many simple ALUs/shaders (VLIW design) that run at a relatively low frequency clock (typically 1120-3200 ALUs at 625-900 MHz), whereas Nvidia's microarchitecture consists of fewer more complex ALUs and tries to compensate with a higher shader clock (typically 448-1024 ALUs at 1150-1544 MHz). Because of this VLIW vs. non-VLIW difference, Nvidia uses up more square millimeters of die space per ALU, hence can pack fewer of them per chip, and they hit the frequency wall sooner than AMD which prevents them from increasing the clock high enough to match or surpass AMD's performance. This translates to a raw ALU performance advantage for AMD:

An old AMD Radeon HD 6990: 3072 ALUs x 830 MHz = 2550 billion 32-bit instruction per second
A New Nvidia GTX 590: 1024 ALUs x 1214 MHz = 1243 billion 32-bit instruction per second

This approximate 2x-3x performance difference exists across the entire range of AMD and Nvidia GPUs. It is very visible in all ALU-bound GPGPU workloads such as Bitcoin, password bruteforcers, etc.

Secondly, another difference favoring Bitcoin mining on AMD GPUs instead of Nvidia's is that the mining algorithm is based on SHA-256, which makes heavy use of the 32-bit integer right rotate operation. This operation can be implemented as a single hardware instruction on AMD GPUs (BIT_ALIGN_INT), but requires three separate hardware instructions to be emulated on Nvidia GPUs (2 shifts + 1 add). This alone gives AMD another 1.7x performance advantage (~1900 instructions instead of ~3250 to execute the SHA-256 compression function).

Combined together, these 2 factors make AMD GPUs overall 3x-5x faster when mining Bitcoins!

April 3, 2013 | 12:02 AM - Posted by Anonymous (not verified)

Fucking plagerism. Copy/paste from some other source, no citation or credit. Your education should be shredded and flushed down the toilet. Here is where you copied it from for people who want to read from someone with actual knowledge and not just ctrl+c ---> ctrl+v.

April 20, 2013 | 11:58 AM - Posted by Anonymous (not verified)

You plagerized me. I complained about someone else who copied something and posted a link. All you did was change the link. You are a loser and the worst scum on the internet.

August 20, 2013 | 08:09 AM - Posted by Anonymous (not verified)

Why are we bitching about plagiarism? If i wanted to make sure his info was correct i would've looked it up myself. I could care less if it was "plagiarized" as long as the information was correct.