Subject: General Tech, Graphics Cards | March 20, 2013 - 05:47 PM | Tim Verry
Tagged: tesla, tegra 3, supercomputer, pedraforca, nvidia, GTC 2013, GTC, graphics cards, data centers
There is a lot of talk about heterogeneous computing at GTC, in the sense of adding graphics cards to servers. If you have HPC workloads that can benefit from GPU parallelism, adding GPUs gives you computing performance in less physical space, and using less power, than a CPU only cluster (for equivalent TFLOPS).
However, there was a session at GTC that actually took things to the opposite extreme. Instead of a CPU only cluster or a mixed cluster, Alex Ramirez (leader of Heterogeneous Architectures Group at Barcelona Supercomputing Center) is proposing a homogeneous GPU cluster called Pedraforca.
Pedraforca V2 combines NVIDIA Tesla GPUs with low power ARM processors. Each node is comprised of the following components:
- 1 x Mini-ITX carrier board
1 x Q7 module (which hosts the ARM SoC and memory)
- Current config is one Tegra 3 @ 1.3GHz and 2GB DDR2
- 1 x NVIDIA Tesla K20 accelerator card (1170 GFLOPS)
- 1 x InfiniBand 40Gb/s card (via Mellanox ConnectX-3 slot)
- 1 x 2.5" SSD (SATA 3 MLC, 250GB)
The ARM processor is used solely for booting the system and facilitating GPU communication between nodes. It is not intended to be used for computing. According to Dr. Ramirez, in situations where running code on a CPU would be faster, it would be best to have a small number of Intel Xeon powered nodes to do the CPU-favorable computing, and then offload the parallel workloads to the GPU cluster over the InfiniBand connection (though this is less than ideal, Pedraforca would be most-efficient with data-sets that can be processed solely on the Tesla cards).
While Pedraforca is not necessarily locked to NVIDIA's Tegra hardware, it is currently the only SoC that meets their needs. The system requires the ARM chip to have PCI-E support. The Tegra 3 SoC has four PCI-E lanes, so the carrier board is using two PLX chips to allow the Tesla and InfiniBand cards to both be connected.
The researcher stated that he is also looking forward to using NVIDIA's upcoming Logan processor in the Pedraforca cluster. It will reportedly be possible to upgrade existing Pedraforca clusters with the new chips by replacing the existing (Tegra 3) Q7 module with one that has the Logan SoC when it is released.
Pedraforca V2 has an initial cluster size of 64 nodes. While the speaker was reluctant to provide TFLOPS performance numbers, as it would depend on the workload, with 64 Telsa K20 cards, it should provide respectable performance. The intent of the cluster is to save power costs by using a low power CPU. If your sever kernel and applications can run on GPUs alone, there are noticeable power savings to be had by switching from a ~100W Intel Xeon chip to a lower-power (approximately 2-3W) Tegra 3 processor. If you have a kernel that needs to run on a CPU, it is recommended to run the OS on an Intel server and transfer just the GPU work to the Pedraforca cluster. Each Pedraforca node is reportedly under 300W, with the Tesla card being the majority of that figure. Despite the limitations, and niche nature of the workloads and software necessary to get the full power-saving benefits, Pedraforca is certainly an interesting take on a homogeneous server cluster!
In another session relating to the path to exascale computing, power use in data centers was listed as one of the biggest hurdles to getting to Exaflop-levels of performance, and while Pedraforca is not the answer to Exascale, it should at least be a useful learning experience at wringing the most parallelism out of code and pushing GPGPU to the limits. And that research will help other clusters use the GPUs more efficiently as researchers explore the future of computing.
The Pedraforca project built upon research conducted on Tibidabo, a multi-core ARM CPU cluster, and CARMA (CUDA on ARM development kit) which is a Tegra SoC paired with an NVIDIA Quadro card. The two slides below show CARMA benchmarks and a Tibidabo cluster (click on image for larger version).
Stay tuned to PC Perspective for more GTC 2013 coverage!
Subject: General Tech | February 22, 2013 - 12:31 PM | Tim Verry
Tagged: servers, facebook, exabyte, data centers, cold storage, cloud computing
Facebook is planning to construct a new cold storage facility to house archived and less-frequently-used media files. The new data center will reside in a new 62,000 sq. ft. building on the company's existing 127-acre property in Prineview, Oregon.
As cold storage, the data center will house servers with up to 3 Exabytes of total data capacity. The machines will be in a sleep state the majority of the time, but will be automatically turned on to serve up media files when accessed on the social network. Because the servers are normally in a lower-power sleep state, there will be a slight delay when users request files. According to Oregon Live, Facebook has stated that the delay will be as much as a couple of seconds and as little as several milliseconds.
The new cold storage facility will enable Facebook to save a great deal on electrical usage and hardware wear and tear (though primarily power bill savings). The company claims that its users upload 350 million photos each day, but that 82% of the social networking site's traffic focuses on a mere 8% of available photos.
Err, not quite the cold storage Facebook has in mind...
Considering Facebook's existing Prineview data center used a whopping 71 million Kilowatts of power in the first 9 months, moving to a new cold storage system for infrequently accessed files is an excellent idea. The photos will still be available, but Facebook will save big on the power bill--a fair compromise for retaining all of those lolcat and meme photos, i think.
The new data center will be rolled out in three phases, each measuring 16,000 sq. ft. in the Prineview facility. The first phase of cold storage servers should be up and running by Q4 2013. There is no estimate on the power savings, but it will be interesting to see how beneficial it will be--and whether other cloud service providers will adopt similar policies.
Also read: Amazon Glacier offers cheap long-term storage.