GTC 2013: Pedraforca Is A Power Efficient ARM + GPU Cluster For Homogeneous (GPU) Workloads

Subject: General Tech, Graphics Cards | March 20, 2013 - 01:47 PM |
Tagged: tesla, tegra 3, supercomputer, pedraforca, nvidia, GTC 2013, GTC, graphics cards, data centers

There is a lot of talk about heterogeneous computing at GTC, in the sense of adding graphics cards to servers. If you have HPC workloads that can benefit from GPU parallelism, adding GPUs gives you computing performance in less physical space, and using less power, than a CPU only cluster (for equivalent TFLOPS).

However, there was a session at GTC that actually took things to the opposite extreme. Instead of a CPU only cluster or a mixed cluster, Alex Ramirez (leader of Heterogeneous Architectures Group at Barcelona Supercomputing Center) is proposing a homogeneous GPU cluster called Pedraforca.
Pedraforca V2 combines NVIDIA Tesla GPUs with low power ARM processors. Each node is comprised of the following components:

  • 1 x Mini-ITX carrier board
  • 1 x Q7 module (which hosts the ARM SoC and memory)
    • Current config is one Tegra 3 @ 1.3GHz and 2GB DDR2
  • 1 x NVIDIA Tesla K20 accelerator card (1170 GFLOPS)
  • 1 x InfiniBand 40Gb/s card (via Mellanox ConnectX-3 slot)
  • 1 x 2.5" SSD (SATA 3 MLC, 250GB)

The ARM processor is used solely for booting the system and facilitating GPU communication between nodes. It is not intended to be used for computing. According to Dr. Ramirez, in situations where running code on a CPU would be faster, it would be best to have a small number of Intel Xeon powered nodes to do the CPU-favorable computing, and then offload the parallel workloads to the GPU cluster over the InfiniBand connection (though this is less than ideal, Pedraforca would be most-efficient with data-sets that can be processed solely on the Tesla cards).

DSCF2421.JPG

While Pedraforca is not necessarily locked to NVIDIA's Tegra hardware, it is currently the only SoC that meets their needs. The system requires the ARM chip to have PCI-E support. The Tegra 3 SoC has four PCI-E lanes, so the carrier board is using two PLX chips to allow the Tesla and InfiniBand cards to both be connected.

The researcher stated that he is also looking forward to using NVIDIA's upcoming Logan processor in the Pedraforca cluster. It will reportedly be possible to upgrade existing Pedraforca clusters with the new chips by replacing the existing (Tegra 3) Q7 module with one that has the Logan SoC when it is released.

Pedraforca V2 has an initial cluster size of 64 nodes. While the speaker was reluctant to provide TFLOPS performance numbers, as it would depend on the workload, with 64 Telsa K20 cards, it should provide respectable performance. The intent of the cluster is to save power costs by using a low power CPU. If your sever kernel and applications can run on GPUs alone, there are noticeable power savings to be had by switching from a ~100W Intel Xeon chip to a lower-power (approximately 2-3W) Tegra 3 processor. If you have a kernel that needs to run on a CPU, it is recommended to run the OS on an Intel server and transfer just the GPU work to the Pedraforca cluster. Each Pedraforca node is reportedly under 300W, with the Tesla card being the majority of that figure. Despite the limitations, and niche nature of the workloads and software necessary to get the full power-saving benefits, Pedraforca is certainly an interesting take on a homogeneous server cluster!

DSCF2413.JPG

In another session relating to the path to exascale computing, power use in data centers was listed as one of the biggest hurdles to getting to Exaflop-levels of performance, and while Pedraforca is not the answer to Exascale, it should at least be a useful learning experience at wringing the most parallelism out of code and pushing GPGPU to the limits. And that research will help other clusters use the GPUs more efficiently as researchers explore the future of computing.

The Pedraforca project built upon research conducted on Tibidabo, a multi-core ARM CPU cluster, and CARMA (CUDA on ARM development kit) which is a Tegra SoC paired with an NVIDIA Quadro card. The two slides below show CARMA benchmarks and a Tibidabo cluster (click on image for larger version).

Stay tuned to PC Perspective for more GTC 2013 coverage!

 

GTC 2013: TYAN Launches New HPC Servers Powered by Kepler-based Tesla Cards

Subject: General Tech, Graphics Cards | March 19, 2013 - 06:52 PM |
Tagged: GTC 2013, tyan, HPC, servers, tesla, kepler, nvidia

Server platform manufacturer TYAN is showing off several of its latest servers aimed at the high performance computing (HPC) market. The new servers range in size from 2U to 4U chassis and hold up to 8 Kepler-based Tesla accelerator cards. The new product lineup consists of two motherboards and three bare-bones systems. The S7055 and S7056 are the motherboards while the FT77-B7059, TA77-B7061, and FT48-B7055.

FT48_B7055_3D_2_Rev2_S.jpg

The TA77-B7061 is the smallest system, with support for two Intel Xeon E5-2600 processors and four Kepler-based Tesla accelerator cards. The FT48-B7055 has si7056 specifications but is housed in a 4U chassis. Finally, the FT77-B7059 is a 4U system with support for two Intel Xeon E5-2600 processors, and up to eight Tesla accelerator cards. The S7055 supports a maximum of 4 GPUs while the S7056 can support two Tesla cards, though these are bare boards so you will have to supply your own cards, processors, and RAM (of course).

FT77A-B7059_3D_S.jpg

According to TYAN, the new Kepler-based HPC systems will be available in Q2 2013, though there is no word on pricing yet.

Stay tuned to PC Perspective for further GTC 2013 Coverage!

Turn half your GTX 690 into a Quadro or Tesla?

Subject: General Tech | March 18, 2013 - 02:23 PM |
Tagged: nvidia, hack, GTX 690, K5000, K10, quadro, tesla, linux

It will take a bit of work with a soldering iron but Hack a Day has posted an article covering how to mod one of the GPUs on a GTX690 into thinking it is either a Quadro K5000 or Tesla K10.  More people will need to apply this mod and test it to confirm that the performance of the GPU actually does match or at least compare to the professional level graphics but the ID string is definitely changed to match one of those two much more expensive GPUs.  They also believe that a similar mod could be applied to the new TITAN graphics card as it is electronically similar to the GTX690.   Of course, if things go bad during the modification you could kill a $1000 card so do be careful.

eevblog_quatro.png

"If hardware manufacturers want to keep their firmware crippling a secret, perhaps they shouldn’t mess with Linux users? We figure if you’re using Linux you’re quite a bit more likely than the average Windows user to crack something open and see what’s hidden inside. And so we get to the story of how [Gnif] figured out that the NVIDIA GTX690 can be hacked to perform like the Quadro K5000. The thing is, the latter costs nearly $800 more than the former!"

Here is some more Tech News from around the web:

Tech Talk

Source: Hack a Day

Too good to be true; bad coding versus GPGPU compute power

Subject: General Tech | November 23, 2012 - 01:03 PM |
Tagged: gpgpu, amd, nvidia, Intel, phi, tesla, firepro, HPC

The skeptics were right to question the huge improvements seen when using GPGPUs in a system for heavy parallel computing tasks.  The cards do help a lot but the 100x improvements that have been reported by some companies and universities had more to do with poorly optimized CPU code than with the processing power of GPGPUs.  This news comes from someone who you might not expect to burst this particular bubble, Sumit Gupta is the GM of NVIDIA's Tesla team and he might be trying to mitigate any possible disappointment from future customers which have optimized CPU coding and won't see the huge improvements seen by academics and other current customers.  The Inquirer does point out a balancing benefit, it is obviously much easier to optimize code in CUDA, OpenCL and other GPGPU languages than it is to code for multicored CPUs.

bubble-burst.jpg

"Both AMD and Nvidia have been using real-world code examples and projects to promote the performance of their respective GPGPU accelerators for years, but now it seems some of the eye popping figures including speed ups of 100x or 200x were not down to just the computing power of GPGPUs. Sumit Gupta, GM of Nvidia's Tesla business told The INQUIRER that such figures were generally down to starting with unoptimised CPU."

Here is some more Tech News from around the web:

Tech Talk

Source: The Inquirer

Podcast #227 - Golden Z77 Motherboard from ECS, High Powered WiFi from Amped Wireless, Supercomputing GPUs and more!

Subject: General Tech | November 15, 2012 - 02:10 PM |
Tagged: titan, thor, tesla, s1000, podcast, nvidia, k20x, Intel, golden board, firepro, ECS, dust, Amped Wireless, amd

PC Perspective Podcast #227 - 11/15/2012

Join us this week as we talk about a Golden Z77 Motherboard from ECS, High Powered WiFi from Amped Wireless, Supercomputing GPUs and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano

This Podcast is brought to you by MSI!

Program length: 1:07:04

Podcast topics of discussion:

  1. Join us for the Hitman: Absolution Game Stream
  2. Week in Reviews:
    1. 0:02:00 ECS Z77H2-AX Golden Board Motherboard
    2. 0:07:00 Amped Wireless R20000G Router and Adapter
    3. 0:12:20 Intel says USB 3.0 and 2.4 GHz don't get along
  3. 0:18:00 This Podcast is brought to you by MSI!
  4. News items of interest:
    1. 0:19:00 A renaissance of game types that have been sadly missing
    2. 0:24:00 You missed our live Medal of Honor Game Stream - loser!
    3. 0:26:12 NVIDIA launches Tesla K20X Card, Powers Titan Supercomputer
    4. 0:30:15 AMD Launches Dual Tahiti FirePro S10000
    5. 0:38:00 Some guy leaves Microsoft - is the Start Menu on its way back??
    6. 0:41:40 AMD is apparently not for sale
    7. 0:46:05 ECS joins the Thunderbolt family with a new Z77 motherboard
  5. Closing:
    1. 0:54:00 Hardware / Software Pick of the Week
      1. Ryan: Corsair Hydro Series H60 for $75
      2. Jeremy: Form over function or vice versa?
      3. Josh: A foundation worth donating to
      4. Allyn: ArmorSuit Military Shields
  1. 1-888-38-PCPER or podcast@pcper.com
  2. http://pcper.com/podcast
  3. http://twitter.com/ryanshrout and http://twitter.com/pcper
  4. Closing/outro

Be sure to subscribe to the PC Perspective YouTube channel!!

 

 

NVIDIA Launches Tesla K20X Accelerator Card, Powers Titan Supercomputer

Subject: General Tech | November 12, 2012 - 06:29 AM |
Tagged: tesla, supercomputer, nvidia, k20x, HPC, CUDA, computing

Graphics card manufacturer NVIDIA launched a new Tesla K20X accelerator card today that supplants the existing K20 as the top of the line model. The new card cranks up the double and single precision floating point performance, beefs up the memory capacity and bandwidth, and brings some efficiency improvements to the supercomputer space.

NVIDIA_Tesla_K20X_K20_GPU_Accelerator.jpg

While it is not yet clear how many CUDA cores the K20X has, NVIDIA has stated that it is using the GK110 GPU, and is running with 6GB of memory with 250 GB/s of bandwidth – a nice improvement over the K20’s 5GB at 208 GB/s. Both the new K20X and K20 accelerator cards are based on the company’s Kepler architecture, but NVIDIA has managed to wring out more performance from the K20X. The K20 is rated at 1.17 TFlops peak double precision and 3.52 TFlops peak single precision while the K20X is rated at 1.31 TFlops and 3.95 TFlops.

Screenshot (363).png

The K20X manages to score 1.22 TFlops in DGEmm, which puts it at almost three times faster than the previous generation Tesla M2090 accelerator based on the Fermi architecture.

Screenshot (362).png

Aside from pure performance, NVIDIA is also touting efficiency gains with the new K20X accelerator card. When two K20X cards are paired with a 2P Sandy Bridge server, NVIDIA claims to achieve 76% efficiency versus 61% efficiency with a 2P Sandy Bridge server equipped with two previous generation M2090 accelerator cards. Additionally, NVIDIA claims to have enabled the Titan supercomputer to reach the #1 spot on the top 500 green supercomputers thanks to its new cards with a rating of 2,120.16 MFLOPS/W (million floating point operations per second per watt).

Screenshot (359).png

NVIDIA claims to have already shipped 30 PFLOPS worth of GPU accelerated computing power. Interestingly, most of that computing power is housed in the recently unveiled Titan supercomputer. This supercomputer contains 18,688 Tesla K20X (Kepler GK110) GPUs and 299,008 16-core AMD Opteron 6274 processors. It will consume 9 megawatts of power and is rated at a peak of 27 Petaflops and 17.59 Petaflops during a sustained Linpack benchmark. Further, when compared to Sandy Bridge processors, the K20 series offers up between 8.2 and 18.1 times more performance at several scientific applications.

Screenshot (360).png

While the Tesla cards undoubtedly use more power than CPUs, you need far fewer numbers of accelerator cards than processors to hit the same performance numbers. That is where NVIDIA is getting its power efficiency numbers from.

NVIDIA is aiming the accelerator cards at researchers and businesses doing 3D graphics, visual effects, high performance computing, climate modeling, molecular dynamics, earth science, simulations, fluid dynamics, and other such computationally intensive tasks. Using CUDA and the parrallel nature of the GPU, the Tesla cards can acheive performance much higher than a CPU-only system can. NVIDIA has also engineered software to better parrellelize workloads and keep the GPU accelerators fed with data that the company calls Hyper-Q and Dynamic Parallelism respectively.

It is interesting to see NVIDIA bring out a new flagship, especially another GK110 card. Systems using the K20 and the new K20X are available now with cards shipping this week and general availability later this month.

You can find the full press release below and a look at the GK110 GPU in our preview.

Anandtech also managed to get a look inside the Titan supercomputer at Oak Ridge National Labratory, where you can see the Tesla K20X cards in action.

NVIDIA Launches Maximus 2.0, Combining Kepler and Tesla

Subject: General Tech, Graphics Cards | August 10, 2012 - 05:34 AM |
Tagged: tesla, quadro, nvidia, maximus, kepler, gk110

At SIGGRAPH 2012 NVIDIA announced a refresh of its Maximus workstation platform technology. Maximus is a technology aimed at professionals that work with simulations or content creation and editing. The updated platform features a Tesla K20 accelerator card as well as a Kepler-based NVIDIA Quadro K5000 graphics card. The K5000 in particular has 4GB of GDDR5 memory on a 256-bit bus and 1536 CUDA cores. NVIDIA states that the Quadro graphics card has 2.1 Teraflops of single precision compute power and draws 122 watts.

The K20 on the other hand features a GK110 Kepler GPU with Dynamic Parallelism and Hyper Q features that reportedly enable more than 1 Teraflop of peak double precision performance. Unfortunately, we do not know much more than that on the new K20 Tesla card as the exact specifications are still listed as “to be announced.” It is slated for a Q4 2012 release.

displaymedia.jpg

The Quadro K5000 workstation GPU

Beyond the hardware itself, the company’s Maximus platform has received software support from several high-profile software companies and system integrators. Some of the companies that certify and support Maximus are Adobe, Autodesk, Mathworks, and Paradigm among others. Dell, Fujitsu, HP, Lenovo, and Supermicro are OEMs that support the hardware and manufacture Maximus-powered workstations.

NVIDIA Tesla K20 GK110 GPU.jpg

The Tesla K20 accelerator card.

The second-generation Maximus technology will be available in desktop workstations as early as December 2012. Further, the NVIDIA Quadro K5000 will be available for purchase as a separate discrete card in October 2012 for $2,249 (MSRP). The Tesla K20 will (for now) only be available integrated in a workstation, but NVIDIA lists the MSRP at $3,199.

More information on the NVIDIA Maximus refresh can be found in the company’s press release.

displaymedia (1).jpg

Source: NVIDIA

NVIDIA's Tesla K10 offers serious single-precision performance

Subject: General Tech | June 19, 2012 - 03:04 PM |
Tagged: nvidia, tesla, K10, GK104, HPC

One of NVIDIA 's line of Tesla HPC cards, the Tesla K10 has actually been seen in the wild.  the new Tesla series is split between the GK104 based K10 model specifically designed for single-precision tasks and the GK110 based Tesla K20 and it is optimized for double-precision tasks.  The K10 is capable of 4.58 teraflops thanks to a pair of GK104s with 8GB of GDDR5, whereas the K20 should in theory double Intel's Xeon Phi at 2 teraflops of double-precision performance but that has yet to be demonstrated.  The K10 that was demonstrated also showed off another of the benefits of NVIDIA's new architecture, even with two GPUs the card remains within a 225W thermal envelop, something that is incredibly important if you are building a cluster.  The Register has gathered together some of the benchmarks and slides from NVIDIA's release, which you can see here.

elreg_nvidia_isc_tesla_k10_benchmarks.jpg

"The Top 500 supercomputer ranking is based on the performance of machines running the Linpack Fortran matrix math benchmark using double-precision floating point math, but a lot of applications will do just fine with single-precision math. And it is for these workloads, graphics chip maker and supercomputing upstart Nvidia says, that it designed the new Tesla K10 server coprocessors."

Here is some more Tech News from around the web:

Tech Talk

 

Source: The Register

Podcast #202 - GTX 670, NVIDIA's GK110 Tesla card, our AMD Trinity Mobile review and more!

Subject: General Tech | May 17, 2012 - 03:16 PM |
Tagged: trinity, tesla, podcast, nvidia, kepler, gtx670, GTC 2012, gk110, GK104, dv nation, a10

PC Perspective Podcast #202 - 05/17/2012

Join us this week as we talk about the GTX 670, NVIDIA's GK110 Tesla card, our AMD Trinity Mobile review and more!

If you want even more PC Perspective this, check out our "aftershow" event as well.  Event might be an over-statement though...

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Jeremy Hellstrom, Josh Walrath, and Allyn Malvantano

Program length: 1:05:16

Program Schedule:

  1. 0:00:21 Introduction
  2. 1-888-38-PCPER or podcast@pcper.com
  3. http://pcper.com/podcast
  4. http://twitter.com/ryanshrout and http://twitter.com/pcper
  5. 0:01:15 NVIDIA GeForce GTX 670 2GB Graphics Card Review - Kepler for $399
    1. GeForce GTX 670 vs GTX 570 Performance Update
    2. The GTX 670 and the Case of the Missing (and Returning) 4-Way SLI Support
  6. 0:11:20 Graphics Card (GPU) Stock Check - May 10th, 2012
    1. Hard to make a profit when no one can find Kepler cards for sale, NVIDIA
  7. 0:14:25 NVIDIA Reveals GK110 GPU - Kepler at 7.1B Transistors, 15 SMX Units
  8. 0:20:20 Lenovo IdeaCentre Q180: Atom's Wake
  9. 0:24:30 AMD A10-4600M Trinity For Mobile Review: Trying To Cut The Ivy
  10. 0:33:40 Just Delivered: DV Nation RAMRod PC - Sandy Bridge-E, 64GB DDR3, 480GB RevoDrive 3 X2
  11. 0:35:42 Plug and Pray PCIe SSD that you can upgrade; OWC's Mercury Accelsior
  12. 0:40:40 GTC 2012: NVIDIA Announces GeForce GRID Cloud Gaming Platform
    1. NVIDIA Pioneers New Standard for High Performance Computing with Tesla GPUs
    2. NVIDIA Introduces World's First Virtualized GPU, Accelerating Graphics for Cloud Computing
  13. 0:53:00 ZOTAC announces ZOTAC GeForce GT 630, GT 620 and GT 610 series
  14. 0:55:00 Hardware / Software Pick of the Week
    1. Jeremy: Only to be used for evil
    2. Josh: Since NV doesn't have an answer yet at this price range...
    3. Allyn: If you need your files secure - without the destruction
  15. 1-888-38-PCPER or podcast@pcper.com
  16. http://pcper.com/podcast   
  17. http://twitter.com/ryanshrout and http://twitter.com/pcper
  18. Closing

NVIDIA Pioneers New Standard for High Performance Computing with Tesla GPUs

Subject: Shows and Expos | May 15, 2012 - 03:43 PM |
Tagged: tesla, nvidia, GTC 2012, kepler, CUDA

SAN JOSE, Calif.—GPU Technology Conference—May 15, 2012—NVIDIA today unveiled a new family of Tesla GPUs based on the revolutionary NVIDIA Kepler GPU computing architecture, which makes GPU-accelerated computing easier and more accessible for a broader range of high performance computing (HPC) scientific and technical applications.

GTC_horizontal_376_large.jpg

The new NVIDIA Tesla K10 and K20 GPUs are computing accelerators built to handle the most complex HPC problems in the world. Designed with an intense focus on high performance and extreme power efficiency, Kepler is three times as efficient as its predecessor, the NVIDIA Fermi architecture, which itself established a new standard for parallel computing when introduced two years ago.

“Fermi was a major step forward in computing,” said Bill Dally, chief scientist and senior vice president of research at NVIDIA. “It established GPU-accelerated computing in the top tier of high performance computing and attracted hundreds of thousands of developers to the GPU computing platform. Kepler will be equally disruptive, establishing GPUs broadly into technical computing, due to their ease of use, broad applicability and efficiency.”

servers-workstations-on.png

The Tesla K10 and K20 GPUs were introduced at the GPU Technology Conference (GTC), as part of a series of announcements from NVIDIA, all of which can be accessed in the GTC online press room.

NVIDIA developed a set of innovative architectural technologies that make the Kepler GPUs high performing and highly energy efficient, as well as more applicable to a wider set of developers and applications. Among the major innovations are:

  • SMX Streaming Multiprocessor – The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX’s energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic.
  • Dynamic Parallelism – This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It greatly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.
  • Hyper-Q – This enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU. This dramatically increases GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI.

“We designed Kepler with an eye towards three things: performance, efficiency and accessibility,” said Jonah Alben, senior vice president of GPU Engineering and principal architect of Kepler at NVIDIA. “It represents an important milestone in GPU-accelerated computing and should foster the next wave of breakthroughs in computational research.”

NVIDIA Tesla K10 and K20 GPUs
The NVIDIA Tesla K10 GPU delivers the world’s highest throughput for signal, image and seismic processing applications. Optimized for customers in oil and gas exploration and the defense industry, a single Tesla K10 accelerator board features two GK104 Kepler GPUs that deliver an aggregate performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.

The NVIDIA Tesla K20 GPU is the new flagship of the Tesla GPU product family, designed for the most computationally intensive HPC environments. Expected to be the world’s highest-performance, most energy-efficient GPU, the Tesla K20 is planned to be available in the fourth quarter of 2012.

The Tesla K20 is based on the GK110 Kepler GPU. This GPU delivers three times more double precision compared to Fermi architecture-based Tesla products and it supports the Hyper-Q and dynamic parallelism capabilities. The GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.

“In the two years since Fermi was launched, hybrid computing has become a widely adopted way to achieve higher performance for a number of critical HPC applications,” said Earl C. Joseph, program vice president of High-Performance Computing at IDC. “Over the next two years, we expect that GPUs will be increasingly used to provide higher performance on many applications.”

Preview of CUDA 5 Parallel Programming Platform
In addition to the Kepler architecture, NVIDIA today released a preview of the CUDA 5 parallel programming platform. Available to more than 20,000 members of NVIDIA’s GPU Computing Registered Developer program, the platform will enable developers to begin exploring ways to take advantage of the new Kepler GPUs, including dynamic parallelism.

The CUDA 5 parallel programming model is planned to be widely available in the third quarter of 2012. Developers can get access to the preview release by signing up for the GPU Computing Registered Developer program on the CUDA website.

Source: NVIDIA