GDC 2013: AMD Reveals Radeon Sky Specifications
Subject: Graphics Cards | March 31, 2013 - 03:06 AM | Tim Verry
Tagged: GDC 13, sky 900, sky 700, sky 500, RapidFire, radeon sky, GCN, cloud gaming, amd
Earlier this week, AMD announced a new series of Radeon-branded cards–called Radeon Sky–aimed at the cloud gaming market. At the time, details on the cards was scarce apart from the fact that the cards would use latency-reduction "secret sauce" tech called RapidFire, and the highest-end model would be the Radeon Sky 900. Thankfully, gamers will not have to wait until AFDS after all, as AMD has posted additional information and specifications to its website. At this point, pricing and the underlying details of RapidFire are the only aspects still unknown.
According to the AMD site, the company will release three Radeon Sky cards later this year, called Sky 500, Sky 700, and Sky 900. All three cards are passively cooled with aluminum fin heatsinks and are based on AMD's Graphics Core Next (GCN) architecture. At the high end is the Sky 900, which is a dual Tahiti graphics card clocked at 825 MHz. The Sky 900 features 1,792 stream processors per GPU for a total of 3,584. The card further features 3GB of GDDR5 RAM per GPU on a 384-bit interface for a total GPU bandwidth of 480GB/s. AMD claims this dual slot card draws up to 300W while under load. In many respects the Sky 900 is the Radeon-equivalent to the company's professional FirePro S10,000 graphics card. It has similar hardware specifications (including the 5.91TFLOPS of single precision performance potential), but a higher TDP. It is also $3,599, though whether AMD will price the gaming-oriented Sky 900 similarly is unknown.
The Sky 700 steps down to a single-GPU graphics card. This card features a single Tahiti GPU clocked at 900 MHz with 1792 stream processors and 6GB of GDDR5. The graphics card memory uses a 384-bit memory interface for a total memory bandwidth of 264GB/s. Although also a dual slot card like the Sky 900, the cooler is smaller and it draws only 225W under load.
Finally, the Sky 500 represents the low end of the company's cloud gaming hardware lineup. It is the Radeon Sky equivalent to the company's consumer-grade Radeon HD 7870. The Sky 500 features a single Pitcairn GPU clocked at 950 MHz with 1280 stream processors, 4GB of GDDR5 on a 256-bit memory bus, and a rated 150W power draw under load. It further features 154GB/s of memory bandwidth and is a single slot graphics card.
| Sky 900 | Sky 700 | Sky 500 | |
| GPU(s) | Dual Tahiti | Single Tahiti | Single Pitcairn |
| GPU Clockspeed | 825 MHz | 900 MHz | 950 MHz |
| Stream Processors | 3584 (1792 per GPU) | 1792 | 1280 |
| Memory | 6GB GDDR5 (3GB per GPU) | 6GB GDDR5 | 4GB GDDR5 |
| Memory Bus | 384-bit | 384-bit | 256-bit |
| Memory Bandwidth | 480GB/s | 264GB/s | 154GB/s |
| TDP | 300W | 225W | 150W |
| Card Profile | dual-slot | dual-slot | single-slot |
Additionally, the Radeon Sky cards all employ a technology called RapidFire that allegedly reduces latency immensely. As Ryan mentioned on the latest PC Perspective Podcast, the Radeon Sky cards are able to stream up to six games. RapidFire is still a mystery, but the company has indicated that one aspect of RapidFire is the use of AMD's Video Encoding Engine (VCE) to encode the video stream on the GPU itself to reduce game latency. The Sky cards will output at 720p resolutions, and the Sky 700 can support either three games at 60 FPS or six games at 30 FPS.
In addition to working with cloud gaming companies Ubitus, G-Cluster, CiiNow, and Otoy, AMD has announced a partnership with VMWare and Citrix. AMD is reportedly working to allow VMWare ESX/ESXi and Citrix XenServer virtual machines to access the GPU hardware directly, which opens up the possibility of using Sky cards to run workstation applications or remote desktops with 3D support much like NVIDIA's VCA and GRID technology (which the company showed off at GTC last week). Personally, I think the Sky cards may be late to the party but is a step in the right direction. Even if cloud gaming doesn't take off, the cards could still be used to great success by enterprise customers if they are able to allow direct access to the full graphics card hardware from within virtual machines!
More information on the Radeon Sky cards can be found on the AMD website.
Summary Thus Far
Because of the complexity and sheer amount of data we have gathered using our Frame Rating performance methodology, we are breaking it up into several articles that each feature different GPU comparisons. Here is the schedule:
- 3/27: Frame Rating Dissected: Full Details on Capture-based Graphics Performance Testing
- 3/27: Radeon HD 7970 GHz Edition vs GeForce GTX 680 (Single and Dual GPU)
- 3/30: AMD Radeon HD 7990 vs GeForce GTX 690 vs GeForce GTX Titan
- 4/2: Radeon HD 7950 vs GeForce GTX 660 Ti (Single and Dual GPU)
- 4/5: Radeon HD 7870 GHz Edition vs GeForce GTX 660 (Single and Dual GPU)
Welcome to the second in our intial series of articles focusing on Frame Rating, our new graphics and GPU performance technology that drastically changes how the community looks at single and multi-GPU performance. In the article we are going to be focusing on a different set of graphics cards, the highest performing single card options on the market including the GeForce GTX 690 4GB dual-GK104 card, the GeForce GTX Titan 6GB GK110-based monster as well as the Radeon HD 7990, though in an emulated form. The HD 7990 was only recently officially announced by AMD at this years Game Developers Conference but the specifications of that hardware are going to closely match what we have here on the testbed today - a pair of retail Radeon HD 7970s in CrossFire.
Will the GTX Titan look as good in Frame Rating as it did upon its release?
If you are just joining this article series today, you have missed a lot! If nothing else you should read our initial full release article that details everything about the Frame Rating methodology and why we are making this change to begin with. In short, we are moving away from using FRAPS for average frame rates or even frame times and instead are using a secondary hardware capture system to record all the frames of our game play as they would be displayed to the gamer, then doing post-process analyzation on that recorded file to measure real world performance.
Because FRAPS measures frame times at a different point in the game pipeline (closer to the game engine) its results can vary dramatically from what is presented to the end user on their display. Frame Rating solves that problem by recording video through a dual-link DVI capture card that emulates a monitor to the testing system and by simply applying a unique overlay color on each produced frame from the game, we can gather a new kind of information that tells a very unique story.
The capture card that makes all of this work possible.
I don't want to spend too much time on this part of the story here as I already wrote a solid 16,000 words on the topic in our first article and I think you'll really find the results fascinating. So, please check out my first article on the topic if you have any questions before diving into these results today!
| Test System Setup | |
| CPU | Intel Core i7-3960X Sandy Bridge-E |
| Motherboard | ASUS P9X79 Deluxe |
| Memory | Corsair Dominator DDR3-1600 16GB |
| Hard Drive | OCZ Agility 4 256GB SSD |
| Sound Card | On-board |
| Graphics Card |
NVIDIA GeForce GTX TITAN 6GB NVIDIA GeForce GTX 690 4GB AMD Radeon HD 7970 CrossFire 3GB |
| Graphics Drivers |
AMD: 13.2 beta 7 NVIDIA: 314.07 beta (GTX 690) NVIDIA: 314.09 beta (GTX TITAN) |
| Power Supply | Corsair AX1200i |
| Operating System | Windows 8 Pro x64 |
On to the results!
Continue reading our review of the GTX Titan, GTX 690 and HD 7990 using Frame Rating!!
GDC 2013: AMD Announces Sky Graphics Cards to Accelerate Cloud Gaming
Subject: General Tech, Graphics Cards | March 27, 2013 - 08:16 PM | Tim Verry
Tagged: sky graphics, sky 900, RapidFire, radeon sky, pc gaming, GDC, cloud gaming, ciinow, amd
AMD is making a new push into cloud gaming with a new series of Radeon graphics cards called Sky. The new cards feature a (mysterious) technology called "RapidFire" that allegedly provides "highly efficient and responsive game streaming" from servers to your various computing devices (tablets, PCs, Smart TVs) over the Internet. At this year's Games Developers Conference (GDC), the company announced that it is working with a number of existing cloud gaming companies to provide hardware and drivers to reduce latency.
AMD is working with Otoy, G-Cluster, Ubitus, and CiiNow. CiiNow in particular was heavily discussed by AMD, and can reportedly provide lower latency than cloud gaming competitor Gaikai. AMD Sky is, in many ways, similar in scope to NVIDIA's GRID technology which was announced last year and shown off at GTC last week. Obviously, that has given NVIDIA a head start, but it is difficult to say how AMD's technology will stack up as the company is not yet providing any specifics. Joystiq was able to obtain information on the high-end Radeon Sky graphics card, however (that's something at least...). The Sky 900 reportedly features 3,584 stream processors, 6GB of GDDR5 RAM, and 480 GB/s of bandwidth. Further, AMD has indicated that the new Radeon Sky cards will be based on the company's Graphics Core Next architecture.
| Sky 900 | Radeon 7970 | |
| Stream Processors | 3,584 | 2,048 |
| Memory | 6GB | 3GB |
| Memory Bandwidth | 480GB/s | 264GB/s |
I think it is safe to assume that the Sky cards will be sold to other cloud gaming companies. They will not be consumer cards, and AMD is not going to get into the cloud gaming business itself. Beyond that, AMD's Sky cloud gaming initiative is still a mystery. Hopefully more details will filter out between now and the AMD Fusion Developer Summit this summer.
How Games Work
Because of the complexity and sheer amount of data we have gathered using our Frame Rating performance methodology, we are breaking it up into several articles that each feature different GPU comparisons. Here is the schedule:
- 3/27: Frame Rating Dissected: Full Details on Capture-based Graphics Performance Testing
- 3/27: Radeon HD 7970 GHz Edition vs GeForce GTX 680 (Single and Dual GPU)
- 3/30: AMD Radeon HD 7990 vs GeForce GTX 690 vs GeForce GTX Titan
- 4/2: Radeon HD 7950 vs GeForce GTX 660 Ti (Single and Dual GPU)
- 4/5: Radeon HD 7870 GHz Edition vs GeForce GTX 660 (Single and Dual GPU)
- 4/16: Frame Rating: Visual Effects of Vsync on Gaming Animation
Introduction
The process of testing games and graphics has been evolving even longer than I have been a part of the industry: 14+ years at this point. That transformation in benchmarking has been accelerating for the last 12 months. Typical benchmarks test some hardware against some software and look at the average frame rate which can be achieved. While access to frame time has been around for nearly the full life of FRAPS, it took an article from Scott Wasson at the Tech Report to really get the ball moving and investigate how each frame contributes to the actual user experience. I immediately began research into testing actual performance perceived by the user, including the "microstutter" reported by many in PC gaming, and pondered how we might be able to test for this criteria even more accurately.
The result of that research is being fully unveiled today in what we are calling Frame Rating – a completely new way of measuring and validating gaming performance.
The release of this story for me is like the final stop on a journey that has lasted nearly a complete calendar year. I began to release bits and pieces of this methodology starting on January 3rd with a video and short article that described our capture hardware and the benefits that directly capturing the output from a graphics card would bring to GPU evaluation. After returning from CES later in January, I posted another short video and article that showcased some of the captured video and stepping through a recorded file frame by frame to show readers how capture could help us detect and measure stutter and frame time variance.
Finally, during the launch of the NVIDIA GeForce GTX Titan graphics card, I released the first results from our Frame Rating system and discussed how certain card combinations, in this case CrossFire against SLI, could drastically differ in perceived frame rates and performance while giving very similar average frame rates. This article got a lot more attention than the previous entries and that was expected – this method doesn’t attempt to dismiss other testing options but it is going to be pretty disruptive. I think the remainder of this article will prove that.
Today we are finally giving you all the details on Frame Rating; how we do it, what we learned and how you should interpret the results that we are providing. I warn you up front though that this is not an easy discussion and while I am doing my best to explain things completely, there are going to be more questions going forward and I want to see them all! There is still much to do regarding graphics performance testing, even after Frame Rating becomes more common. We feel that the continued dialogue with readers, game developers and hardware designers is necessary to get it right.
Below is our full video that features the Frame Rating process, some example results and some discussion on what it all means going forward. I encourage everyone to watch it but you will definitely need the written portion here to fully understand this transition in testing methods. Subscribe to your YouTube channel if you haven't already!
Continue reading our analysis of the new Frame Rating performance testing methodology!!
NVIDIA Boosts the Sub-$200 market with the GTX 650 Ti Boost
Subject: Graphics Cards | March 26, 2013 - 07:41 PM | Jeremy Hellstrom
Tagged: nvidia, hd 7790, gtx 650 ti boost, gtx 650 Ti, gpu boost, gk106
Why Boost you may ask? If you guessed that NVIDIA added their new Boost Clock feature to the card you should win a prize as that is exactly what makes the GTX 650Ti special. With a core GPU speed of 980MHz, boosting to 1033MHz and beyond this card is actually aimed to compete with AMD's HD7850, not the newly released HD7790, at least the 2GB model is. Along with the boost in clock comes a wider memory pipeline and a corresponding increase in ROPs. The 2GB model should be about $170, right on the cusp between value and mid-range but is the price worth admission? Get a look at the performance at [H]ard|OCP.
"NVIDIA is launching the GeForce GTX 650 Ti Boost today. This video card is priced in the $149-$169 price range, and should give the $150 price segment another shakedown. Does it compare to the Radeon HD 7790, or is it on the level of the more expensive Radeon HD 7850? We will find out in today's latest games, you may be surprised."
Here are some more Graphics Card articles from around the web:
- Nvidia's GeForce GTX 650 Ti Boost @ The Tech Report
- Nvidia GTX 650 Ti Boost 2GB @ LanOC Reviews
- NVIDIA and EVGA GeForce GTX 650 Ti BOOST Video Card Review @ Legit Reviews
- NVIDIA GeForce GTX 650Ti Boost Review @ OCC
- Nvidia GeForce GTX 650 Ti Boost @ Hardware.info
- Nvidia GeForce GTX 650 Ti Boost @ Bjorn3D
- NVIDIA Geforce GTX 650Ti Boost 2GB Edition Review @Hi Tech Legion
- EVGA GTX 650Ti BOOST 2GB Superclocked Review @Hi Tech Legion
- NVIDIA GeForce GTX 650 Ti Boost 2GB @ Tweaktown
- NVIDIA GeForce GTX 650 Ti Boost 2 GB @ techPowerUp
- NVIDIA GeForce GTX 650 Ti BOOST @ Benchmark Reviews
- NVIDIA GTX 650 Ti Boost 2GB Review @ Hardware Canucks
- NVIDIA Chips Comparison Table @ Hardware Secrets
- AMD ATI Chips Comparison Table @ Hardware Secrets
- Workstation Graphics Card Comparison Guide @ TechARP
- PowerColor Radeon HD 7790 Turbo Duo Review @ OCC
- PowerColor HD 7790 Turbo Duo 1 GB @ techPowerUp
- Sapphire HD7950 MAC Edition @ Kitguru
The GTX 650 Ti Gets Boost and More Memory
In mid-October NVIDIA released the GeForce GTX 650 Ti based on GK106, the same GPU that powers the GTX 660 though with fewer enabled CUDA cores and GPC units. At the time we were pretty impressed with the 650 Ti:
The GTX 650 Ti has more in common with the GTX 660 than it does the GTX 650, both being based on the GK106 GPU, but is missing some of the unique features that NVIDIA has touted of the 600-series cards like GPU Boost and SLI.
Today's release of the GeForce GTX 650 Ti BOOST actually addresses both of those missing features by moving even closer to the specification sheet found on the GTX 660 cards.
Our video review of the GTX 650 Ti BOOST and Radeon HD 7790.
Option 1: Two GPCs with Four SMXs
Just like we saw with the original GTX 650 Ti, there are two different configurations of the GTX 650 Ti BOOST; both have the same primary specifications but will differ in which SMX is disabled from the full GK106 ASIC. The newer version will still have 768 CUDA cores but clock speeds will increase from 925 MHz to 980 MHz base and 1033 MHz typical boost clock. Texture unit count remains the same at 64.
Continue reading our review of the NVIDIA GeForce GTX 650 Ti BOOST graphics card!!
Gaming for $150 with the Radeon HD 7790
Subject: Graphics Cards | March 22, 2013 - 01:56 PM | Jeremy Hellstrom
Tagged: hd 7790, graphics core next, GCN, ea Islands, bonaire, amd
AMD is trying to fill a gap in their product line between the less than $200 HD 7850 and the ~$120 HD 7770 with a $150 card, the HD 7790. The naming scheme implies two GPUs but this is not the case, it is a single Bonaire GCN chip with 896 stream processors, 56 texture units and an impressive fill rate of up to 1.79 TFLOPS thanks to some optimization of the GCN architecture. It has 1GB of GDDR5 at 6GHz effective and a CPU speed dependent on the model, in [H]ard|OCP's case the ASUS Radeon HD 7790 DirectCU II OC runs at 1.075GHz. [H] passed it a Silver Award for being a vast improvement over the 7770 and good competition for the GTX 650 Ti but feel the card does need to be faster.
This card also makes an appearance on our front page, with a lot of Frame Rating charts so you can see not only the raw FPS data you are used to, but also an indept look at how the game is going to 'feel' while you play.
"AMD is launching the Radeon HD 7790 today. This new video card should give the sub-$200 video card segment a kick in the pants. Will it provide enough performance for today's latest games at $149? We will find out, testing the new ASUS Radeon HD 7790 DirectCU II OC with no less than six of today's hottest games."
Here are some more Graphics Card articles from around the web:
- AMD's Radeon HD 7790 @ The Tech Report
- AMD Radeon HD 7790 review (incl. frametimes) @ Hardware.info
- AMD Radeon HD 7790 @ TechSpot
- AMD Radeon HD 7790 Review @ Hardware Canucks
- Sapphire Radeon HD 7790 Dual-X 1GB OC @ eTeknix
- Sapphire Radeon HD 7790 1GB Dual-X OC @ Tweaktown
- Sapphire HD 7790 1GB Graphics Card @ Bjorn3D
- Sapphire Radeon HD 7790 Dual-X OC Review @ OCC
- Sapphire HD 7790 Dual-X OC Video Card Review @ Hi Tech Legion
- AMD Radeon HD 7790 CrossFire @ techPowerUp
- ASUS HD 7790 DirectCU II OC @ Overclockers.com
- Sapphire HD 7790 Dual-X 1 GB @ techPowerUp
- AMD Radeon HD 7790 Video Card Review w/ Gigabyte & Sapphire @ Legit Reviews
- ASUS HD 7790 Direct CU II OC 1 GB @ techPowerUp
- Sapphire HD7790 OC @ Kitguru
- PowerColor PCS+ HD 7850 Radeon Graphic Card Review @ Pro-Clockers
- HIS Radeon HD 7850 iPower IceQ Turbo 4GB Video Card in CrossFire @ Tweaktown
- HIS Radeon HD 7770 iCooler 1GB Overclocked @ Tweaktown
- Mid-Range AMD Graphics Card Round-Up (HIS 7770 GHz / HIS 7850 / Sapphire 7850) @ Kitguru
- PowerColor PCS HD7870 MYST Video Card Review @ Legit Reviews
A New GPU with the Same DNA
When we talked with AMD recently about its leaked roadmap that insinuated that we would not see any new GPUs in 2013, they were adamant that other options would be made available to gamers but were coy about about saying when and to what degree. As it turns out, today marks the release of the Radeon HD 7790, a completely new piece of silicon under the Sea Islands designation, that uses the same GCN (Graphics Core Next) architecture as the HD 7000-series / Southern Islands GPUs with a handful of tweaks and advantages from improved clock boosting with PowerTune to faster default memory clocks.
To be clear, the Radeon HD 7790 is a completely new ASIC, not a rebranding of a currently available part, though the differences between the options are mostly in power routing and a reorganization of the GCN design found in Cape Verde and Pitcairn designs. The code name for this particular GPU is Bonaire and it is one of several upcoming updates to the HD 7000 cards.
Bonaire is built on the same 28nm TSMC process technology that all Southern Islands parts are built on and consists of 2.08 billion transistors in a 160 mm2 die. Compared to the HD 7800 (Pitcairn) GPU at 212 mm2 and HD 7700 (Cape Verde) at 120 mm2, the chip for the HD 7790 falls right in between. And while the die images above are likely not completely accurate, it definitely appears that AMD's engineers have reorganized the internals.
Bonaire is built with 14 CUs (compute units) for a total stream processor count of 896, which places it closer to the performance level of the HD 7850 (1024 SPs) than it does the HD 7770 (640 SPs). The new Sea Islands GPU includes the same dual tessellation engines of the higher end HD 7000s as well and a solid 128-bit memory bus that runs at 6.0 Gbps out the gate on the 1GB frame buffer. The new memory controller is completely reworked in Bonaire and allows for a total memory bandwidth of 96 GB/s in comparison to the 72 GB/s of the HD 7770 and peaking theoretical compute performance at 1.79 TFLOPS.
The GPU clock rate is set at 1.0 GHz, but there is more on that later.
Continue reading our review of the Sapphire AMD Radeon HD 7790 1GB Bonaire GPU!!
GTC 2013: Cortexica Vision Systems Talks About the Future of Image Recognition During the Emerging Companies Summit
Subject: General Tech, Graphics Cards | March 20, 2013 - 09:44 PM | Tim Verry
Tagged: video fingerprinting, image recognition, GTC 2013, gpgpu, cortexica, cloud computing
The Emerging Companies Summit is an series of sessions at NVIDIA's GPU Technology Conference (GTC) that gives the floor to CEOs from several up-and-coming technology startups. Earlier today, the CEO of Cortexica Vision Systems took the stage to talk briefly about the company's products and future direction, and to answer questions from a panel of industry experts.
If you tuned into NVIDIA's keynote presentation yesterday, you may have noticed the company showing off a new image recognition technology. That technology is being developed by a company called Cortexica Vision Systems. While it cannot perform facial recognition, it is capable of identifying everything else, according the company's CEO Ian McCready. Currently, Cortexica is employing a cluster of approximately 70 NVIDIA graphics cards, but it is capable of scaling beyond that. Mcready estimates that about 100 GPUs and a CPU would be required by a company like eBay, should they want to implement Cortexica's image recognition technology in-house.
The Cortexica technology uses images captured by a camera (such as the one in your smartphone), which is then sent to Cortexica's servers for processing. The GPUs in the Cortexica cluster handle the fingerprint creation task while the CPU does the actual lookup in the database of known fingerprints to either find an exact match, or return similar image results. According to Cortexica, the fingerprint creation takes only 100ms, though as more powerful GPUs make it into mobile devices, it may be possible to do the fingerprint creation on the device itself, reducing the time between taking a photo and getting relevant results back.
The image recognition technology is currently being used by Ebay Motors in the US, UK, and Germany. Cortexica hopes to find a home with many of the fashion companies that would use the technology to allow people to identify and ultimately purchase clothing they take photos of on television or in public. The technology can also perform 360-degree object recognition, identify logos that are as small as .4% of the screen, and identify videos. In the future Cortexica hopes to reduce latency, improve recognition accuracy, and add more search categories. Cortexica is also working on enabling an "always on" mobile device that will constantly be indentifying everything around it, which is both cool and a bit creepy. With mobile chips like Logan and Parker coming in the future, Cortexica hopes to be able to do on-device image recognition, which would greatly reduce latency and allow the use of the recognition technology while not connected to the internet.
The number of photos taken is growing rapidly, where as many as 10% of all photos stored "in the cloud" were taken last year alone. Even Facebook, with it's massive data centers is moving to a cold-storage approach to save money on electricity costs of storing and serving up those photos. And while some of these photos have relevant meta data, the majority of photos taken do not, and Cortexica claims that its technology can be used to get around that issue, but identifying photos as well as finding similar photos using its algorithms.
Stay tuned to PC Perspective for more GTC coverage!
Additional slides are available after the break:
GTC 2013: Pedraforca Is A Power Efficient ARM + GPU Cluster For Homogeneous (GPU) Workloads
Subject: General Tech, Graphics Cards | March 20, 2013 - 01:47 PM | Tim Verry
Tagged: tesla, tegra 3, supercomputer, pedraforca, nvidia, GTC 2013, GTC, graphics cards, data centers
There is a lot of talk about heterogeneous computing at GTC, in the sense of adding graphics cards to servers. If you have HPC workloads that can benefit from GPU parallelism, adding GPUs gives you computing performance in less physical space, and using less power, than a CPU only cluster (for equivalent TFLOPS).
However, there was a session at GTC that actually took things to the opposite extreme. Instead of a CPU only cluster or a mixed cluster, Alex Ramirez (leader of Heterogeneous Architectures Group at Barcelona Supercomputing Center) is proposing a homogeneous GPU cluster called Pedraforca.
Pedraforca V2 combines NVIDIA Tesla GPUs with low power ARM processors. Each node is comprised of the following components:
- 1 x Mini-ITX carrier board
-
1 x Q7 module (which hosts the ARM SoC and memory)
- Current config is one Tegra 3 @ 1.3GHz and 2GB DDR2
- 1 x NVIDIA Tesla K20 accelerator card (1170 GFLOPS)
- 1 x InfiniBand 40Gb/s card (via Mellanox ConnectX-3 slot)
- 1 x 2.5" SSD (SATA 3 MLC, 250GB)
The ARM processor is used solely for booting the system and facilitating GPU communication between nodes. It is not intended to be used for computing. According to Dr. Ramirez, in situations where running code on a CPU would be faster, it would be best to have a small number of Intel Xeon powered nodes to do the CPU-favorable computing, and then offload the parallel workloads to the GPU cluster over the InfiniBand connection (though this is less than ideal, Pedraforca would be most-efficient with data-sets that can be processed solely on the Tesla cards).
While Pedraforca is not necessarily locked to NVIDIA's Tegra hardware, it is currently the only SoC that meets their needs. The system requires the ARM chip to have PCI-E support. The Tegra 3 SoC has four PCI-E lanes, so the carrier board is using two PLX chips to allow the Tesla and InfiniBand cards to both be connected.
The researcher stated that he is also looking forward to using NVIDIA's upcoming Logan processor in the Pedraforca cluster. It will reportedly be possible to upgrade existing Pedraforca clusters with the new chips by replacing the existing (Tegra 3) Q7 module with one that has the Logan SoC when it is released.
Pedraforca V2 has an initial cluster size of 64 nodes. While the speaker was reluctant to provide TFLOPS performance numbers, as it would depend on the workload, with 64 Telsa K20 cards, it should provide respectable performance. The intent of the cluster is to save power costs by using a low power CPU. If your sever kernel and applications can run on GPUs alone, there are noticeable power savings to be had by switching from a ~100W Intel Xeon chip to a lower-power (approximately 2-3W) Tegra 3 processor. If you have a kernel that needs to run on a CPU, it is recommended to run the OS on an Intel server and transfer just the GPU work to the Pedraforca cluster. Each Pedraforca node is reportedly under 300W, with the Tesla card being the majority of that figure. Despite the limitations, and niche nature of the workloads and software necessary to get the full power-saving benefits, Pedraforca is certainly an interesting take on a homogeneous server cluster!
In another session relating to the path to exascale computing, power use in data centers was listed as one of the biggest hurdles to getting to Exaflop-levels of performance, and while Pedraforca is not the answer to Exascale, it should at least be a useful learning experience at wringing the most parallelism out of code and pushing GPGPU to the limits. And that research will help other clusters use the GPUs more efficiently as researchers explore the future of computing.
The Pedraforca project built upon research conducted on Tibidabo, a multi-core ARM CPU cluster, and CARMA (CUDA on ARM development kit) which is a Tegra SoC paired with an NVIDIA Quadro card. The two slides below show CARMA benchmarks and a Tibidabo cluster (click on image for larger version).
Stay tuned to PC Perspective for more GTC 2013 coverage!












