ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale

Introduction, GPGPU history, ATI Stream and CUDA overviews

Since our initial review of five of NVidia’s CUDA-enabled applications back in June, we’ve been chomping at the bit to get our first real look at ATI’s entry into the GPU computing ring called ATI Stream. Both of these platforms use parallel computing architectures to utilize their GPU’s stream processors, in tandem with the CPU, to significantly increase any system’s video transcoding speeds. Today, we are going to discuss both of these technologies as well as benchmark a couple video transcoding applications from Cyberlink that actually support both CUDA and ATI Stream.

It’s a bit late to the party, but can ATI Stream bring the heat against a refined CUDA technology?

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale - Graphics Cards 54

Since our initial review of five of NVidia’s CUDA-enabled applications back in June, we’ve been chomping at the bit to get our first real look at ATI’s entry into the GPU computing ring called ATI Stream. Both of these platforms use parallel computing architectures to utilize the GPU’s stream processors, in tandem with the CPU, to significantly increase any system’s video transcoding speeds.

Today, we are going to discuss both of these technologies as well as benchmark a couple video transcoding applications from Cyberlink that support both CUDA and ATI Stream platforms. We will also take a brief look at ATI’s Avivo video converter to see what ATI’s own free software has to offer.

GPGPU history at a glance

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale - Graphics Cards 55
Video Equipment with Ikonas graphics system (Courtesy photo)

The first General-Purpose Graphics Processing Unit or GPGPU was initially created in 1978 when Ikonas developed a programmable raster display system for cockpit instrumentation. Before 2006, there were only a handful of other systems that incorporated GPGPU technology.

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale - Graphics Cards 56
Former CEO Dave Orton explains ATI’s Stream computing initiative at a press event in 2006 (Courtesy of TechReport.com)

In November 2006, AMD’s website stated they started the “GPGPU revolution” with the introduction of “Close To Metal”, the first iteration of their GPGPU technology that has now evolved into ATI Stream. But, after several missteps and delays, they weren’t actually able to fully utilize ATI Stream technology until their December 2008 launch of the ATI Catalyst 8.12 driver, which officially brought Stream to the masses.

To give consumers a glipse of this new technology, AMD reconfigured ATI’s free Avivo Video Converter to be Stream-compatible. Since it’s re-release in 2008, only two video transcoding applications have incorporated ATI Stream into its programming — Cyberlink’s PowerDirector and MediaShow Expresso applications. There are other applications in various stages of development, but nothing else available on the market currently.

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale - Graphics Cards 57
NVidia CEO and president Jen-Hsun Huang plays with a game using NVidia’s Physx technology for gaming, at the International Consumer Electronics Show in Las Vegas Jan. 8, 2009. (Courtesy photo)

On the other side of the fence, NVidia’s Compute Unified Device Architecture or “CUDA” platform was announced together with G80 in November 2006. A public beta version of the CUDA SDK was released in February 2007. The first version of CUDA rolled out with Tesla in June 2007, which was based on G80 and designed for high performance computing. At the end of 2007, NVidia released CUDA 1.1 beta, which added new features but was a minor release. Since it’s initial release, CUDA has been used and featured in seven retail video transcoding applications.

The development of GPGPUs is truly about fully utilizing all the processing potential that lies dormant in graphics cards when users aren’t playing Crysis or Far Cry 2. GPGPUs will allow users to see what will happen if other applications are able to make use of the stream processors in a graphics card. This is why NVidia and AMD are frantically working to harness the GPGPU potential of their respective graphics hardware.

Why is GPGPU technology important?

The importance of the emergence of GPGPU technology is simple — it will increase the speed of many types of tasks consumers do every day by using the GPU and the CPU in tandem for “general purpose” computations (or number crunching) that was once only handled by the CPU alone. When this technology fully matures, consumers will see noticable performance increases when they convert audio and video files, play graphics-intensive games, and in other daily tasks. ATI Stream and CUDA focuses on using the GPU’s stream processors in tandem with the CPU to enable the entire system to handle computing-intensive applications, and more specifically video transcoding applications.

ATI Stream technology overview

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale - Graphics Cards 58
(Courtesy of ATI)

Ryan first wrote about ATI’s new Stream technology back in November 2008, and since that time the basic premise behind the technology still stands. ATI Stream technology is based off a set of advanced hardware and software technologies that enable AMD graphics processors, working in concert with the system’s central processor, to accelerate many applications beyond just graphics. Stream technology enables hundreds of parallel Stream cores inside AMD graphics processors to accelerate general purpose applications. These capabilities will allow ATI Stream-enabled programs to operate with optimized performance or with new functionality.

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale - Graphics Cards 59
(Courtesy of ATI)

ATI Stream uses parallel computing architecture that will take advantage of thegraphics card’s stream processors to compute problems, applications or tasks that can be broken down into parallel, identical operations and run simultaneously on a single processor device. Stream computing also takes advantage of a SIMD methodology (single instruction, multiple data) whereas a CPU is a modified SISD methodology (single instruction, single data); modifications taking various parallelism techniques into account.

NVidia CUDA technology overview

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale - Graphics Cards 60
(Courtesy of NVidia)

NVidia CUDA is a general purpose parallel computing architecture that leverages the parallel compute engine in NVidia graphics processing units to solve many complex computational problems in a fraction of the time required on a CPU. It includes the CUDA Instruction Set Architecture and the parallel compute engine in the GPU. No GPU parallel computing architecture has been more in the spotlight than NVidia’s CUDA either. CUDA performs two major functions that consumers should be aware of – it helps reduce or match CPU usage by engaging the GPU’s stream processors and it can accelerate any computing process where CUDA is enabled.

ATI Stream vs. NVIDIA CUDA - GPGPU computing battle royale - Graphics Cards 61
(Courtesy of NVidia)

NVidia claims to have sold more than 100 million CUDA-enabled GPUs to date which is probably accurate, and they are also being supported by thousands of software developers who NVidia says are already using the free CUDA software development tools to solve problems in a variety of professional and home applications.

Now that you have better insight into the history behind GPGPU technology as well as ATI and NVidia’s role in the technology’s development, let’s move on to the ATI Stream and CUDA-enabled video transcoding applications we will be using for our review today.

Bitcoin Minner on March 21, 2013 at 6:06 pm

For BitCoin Minners AMD GPUs
For BitCoin Minners AMD GPUs faster than Nvidia GPUs!
Why?

Firstly, AMD designs GPUs with many simple ALUs/shaders (VLIW design) that run at a relatively low frequency clock (typically 1120-3200 ALUs at 625-900 MHz), whereas Nvidia’s microarchitecture consists of fewer more complex ALUs and tries to compensate with a higher shader clock (typically 448-1024 ALUs at 1150-1544 MHz). Because of this VLIW vs. non-VLIW difference, Nvidia uses up more square millimeters of die space per ALU, hence can pack fewer of them per chip, and they hit the frequency wall sooner than AMD which prevents them from increasing the clock high enough to match or surpass AMD’s performance. This translates to a raw ALU performance advantage for AMD:

An old AMD Radeon HD 6990: 3072 ALUs x 830 MHz = 2550 billion 32-bit instruction per second
A New Nvidia GTX 590: 1024 ALUs x 1214 MHz = 1243 billion 32-bit instruction per second

This approximate 2x-3x performance difference exists across the entire range of AMD and Nvidia GPUs. It is very visible in all ALU-bound GPGPU workloads such as Bitcoin, password bruteforcers, etc.

Secondly, another difference favoring Bitcoin mining on AMD GPUs instead of Nvidia’s is that the mining algorithm is based on SHA-256, which makes heavy use of the 32-bit integer right rotate operation. This operation can be implemented as a single hardware instruction on AMD GPUs (BIT_ALIGN_INT), but requires three separate hardware instructions to be emulated on Nvidia GPUs (2 shifts + 1 add). This alone gives AMD another 1.7x performance advantage (~1900 instructions instead of ~3250 to execute the SHA-256 compression function).

Combined together, these 2 factors make AMD GPUs overall 3x-5x faster when mining Bitcoins!

13 Comments

Joe on May 9, 2011 at 10:39 pm

Please change the tile.
Your
Please change the tile.

Your article is not about a comparison of Stream and CUDA performance, it is the difference between two software implementations utilising Stream and CUDA.

These technologies allow you to parallelise your algorithms, to imply that one technology performs ,as you essentially say, ‘better quality maths’ than the other is ignorant.

Please do not misdirect readers like this.

Regards.

Joe Bloggs
- Anonymous on August 30, 2012 at 11:22 am
  
  Please change you word.
  Your
  Please change you word.
  
  Your comment is not about a reply to the article, it is a quantification of how butthurt you are.
  
  These new breakthrows allow us to see how badly you are spell ,as you essentially try to use ‘larger words’ but not good at English.
  
  Please do not obfuscate readers’ thoughtings like this.
  
  Regards.
  
  Bloe Joggs
  - Bloe Bollox on April 18, 2013 at 5:22 am
    
    damn dude, look at your own
    damn dude, look at your own english, it’s absolutely dreadful!
  - Anonymous on April 27, 2013 at 3:28 pm
    
    Ya dude, your an idiot, your
    Ya dude, your an idiot, your article is misleading. For sure!
    
    Peace
    Hater Bater Fuck Face
    - Anonymous on February 27, 2017 at 8:07 pm
      
      SO MUCH HATE !
      SO MUCH HATE !
Anonymous on May 22, 2011 at 11:06 am

You are comparing two cards,
You are comparing two cards, one is nearly a year older than the other one, its elementary that the new one is going to win. This review is biased
Anonymous on June 30, 2011 at 7:22 pm

Why are you not comparing the
Why are you not comparing the same frame in the outputs? How can you do a comparison of different frames and make a decision on differences in quality?
Rupert Grint on September 3, 2011 at 10:03 pm

My personal gaming research
My personal gaming research team has found nVIDIA’s CUDA technology to be superior, but they compared current GPUs, not GPUs with a manufacturing time gap.
Armand Laroche on October 14, 2012 at 5:50 am

This is a very interesting
This is a very interesting article to contribute to my PC Hardware class, as I’m currently in a Network Admin program in Vermont. Please keep up the good work guys I love your site, and you have been very helpful over the last several semesters.
Bitcoin Minner on March 21, 2013 at 6:06 pm

For BitCoin Minners AMD GPUs
For BitCoin Minners AMD GPUs faster than Nvidia GPUs!
Why?

Firstly, AMD designs GPUs with many simple ALUs/shaders (VLIW design) that run at a relatively low frequency clock (typically 1120-3200 ALUs at 625-900 MHz), whereas Nvidia’s microarchitecture consists of fewer more complex ALUs and tries to compensate with a higher shader clock (typically 448-1024 ALUs at 1150-1544 MHz). Because of this VLIW vs. non-VLIW difference, Nvidia uses up more square millimeters of die space per ALU, hence can pack fewer of them per chip, and they hit the frequency wall sooner than AMD which prevents them from increasing the clock high enough to match or surpass AMD’s performance. This translates to a raw ALU performance advantage for AMD:

An old AMD Radeon HD 6990: 3072 ALUs x 830 MHz = 2550 billion 32-bit instruction per second
A New Nvidia GTX 590: 1024 ALUs x 1214 MHz = 1243 billion 32-bit instruction per second

This approximate 2x-3x performance difference exists across the entire range of AMD and Nvidia GPUs. It is very visible in all ALU-bound GPGPU workloads such as Bitcoin, password bruteforcers, etc.

Secondly, another difference favoring Bitcoin mining on AMD GPUs instead of Nvidia’s is that the mining algorithm is based on SHA-256, which makes heavy use of the 32-bit integer right rotate operation. This operation can be implemented as a single hardware instruction on AMD GPUs (BIT_ALIGN_INT), but requires three separate hardware instructions to be emulated on Nvidia GPUs (2 shifts + 1 add). This alone gives AMD another 1.7x performance advantage (~1900 instructions instead of ~3250 to execute the SHA-256 compression function).

Combined together, these 2 factors make AMD GPUs overall 3x-5x faster when mining Bitcoins!
- Anonymous on April 3, 2013 at 4:02 am
  
  Fucking plagerism. Copy/paste
  Fucking plagerism. Copy/paste from some other source, no citation or credit. Your education should be shredded and flushed down the toilet. Here is where you copied it from for people who want to read from someone with actual knowledge and not just ctrl+c —> ctrl+v.
  
  https://en.bitcoin.it/wiki/Why_a_GPU_mines_faster_than_a_CPU
  - Anonymous on April 20, 2013 at 3:58 pm
    
    You plagerized me. I
    You plagerized me. I complained about someone else who copied something and posted a link. All you did was change the link. You are a loser and the worst scum on the internet.
  - Anonymous on August 20, 2013 at 12:09 pm
    
    Why are we bitching about
    Why are we bitching about plagiarism? If i wanted to make sure his info was correct i would’ve looked it up myself. I could care less if it was “plagiarized” as long as the information was correct.

ATI Stream vs. NVIDIA CUDA – GPGPU computing battle royale

Introduction, GPGPU history, ATI Stream and CUDA overviews

Video News

About The Author

Steve Grever

13 Comments

Leave a reply Cancel reply

Latest Podcasts

Archive & Timeline

Previous 12 months

Explore: All The Years!

Shop new Deals of the Day at GameStop.com!

User login status

ATI Stream vs. NVIDIA CUDA – GPGPU computing battle royale

Introduction, GPGPU history, ATI Stream and CUDA overviews

Video News

About The Author

Steve Grever

Related Posts

ASUS Launches GTX 770 DirectCU II OC Graphics Card

ASUS Announces GeForce GTX 970 DirectCU Mini: More Mini-ITX Gaming Goodness

ATI Radeon X1950 Pro: Mainstream Graphics and Internal CrossFire

AMD’s Dropping the R9 295X2 Price to $999 USD

13 Comments

Leave a reply Cancel reply

Latest Podcasts

Archive & Timeline

Previous 12 months

Explore: All The Years!

Shop new Deals of the Day at GameStop.com!

User login status