Inspur Readies Tianhe-2 Supercomputer With 54 Petaflop Theoretical Peak Performance
Subject: Systems | June 3, 2013 - 09:27 PM | Tim Verry
Tagged: Xeon Phi, tianhe-2, supercomputer, Ivy Bridge, HPC, China
A powerful new supercomputer constructed by Chinese company Inspur is currently in testing at the National University of Defense Technology. Called the Tianhe-2, the new supercomputer has 16,000 compute nodes and approximately 54 Petaflops of peak theoretical compute performance.
Destined for the National Supercomputer Center in Guangzhou, China, the open HPC platform will be used for education and research projects. The Tianhe-2 is composed of 125 racks with 128 compute nodes in each rack.
The compute nodes are broken down into two types: CPM and APU modules. One of each node type makes up a single compute board. The CPM module hosts four Intel Ivy Bridge processors, 128GB system memory, and a single Intel Xeon Phi accelerator card with 8GB of its own memory. Each APU module adds five Xeon Phi cards to every compute board. The compute boards (a CPM module + a APU module) contain two NICs that connect the various compute boards with Inspur's custom THExpress2 high bandwidth interconnects. Finally, the Tianhe-2 supercomputer will have access to 12.4 Petabytes of storage that is shared across all of the compute boards.
In all, the Tianhe-2 is powered by 32,000 Intel Ivy Bridge processors, 1.024 Petabytes of system memory (not counting Phi dedicated memory--which would make the total 1.404 PB), and 48,000 Intel Xeon Phi MIC (Many Integrated Cores) cards. That is a total of 3,120,000 processor cores (though keep in mind that number is primarily made up of the relatively simple individual Phi cores as there are 57 cores to each Phi card).
Inspur claims up to 3.432 TFlops of peak compute performance per compute node (which, for simplicity they break down as one node is 2 Ivy Bridge chips, 64GB memory, and 3 Xeon Phi cards although the two compute modules that make up a node are not physically laid out that way) for a total theoretical potential compute power of 54,912 TFlops (or 54.912 Petaflops) across the entire supercomputer. In the latest Linpack benchmark run, researchers saw up to 63% efficiency in attaining peak performance -- 30.65 PFlops out of 49.19 PFlops peak/theoretical performance -- when only using 14,336 nodes with 50GB RAM each. Further testing and optimization should improve that number, and when all nodes are brought online the real world performance will naturally be higher than the current benchmarks. With that said, the Tianhe-2 is already besting Cray's TITAN, which is promising (though I hope Cray comes back next year and takes the crown again, heh).
In order to keep all of this hardware cool, Inspur is planning a custom liquid cooling system using chilled water. The Tianhe-2 will draw up to 17.6 MW of power under load. Once the liquid cooling system is implemented the supercomputer will draw 24MW while under load.
This is an impressive system, and an interesting take on a supercomputer architecture considering the rise in popularity of heterogeneous architectures that pair massive numbers of CPUs with graphics processing units (GPUs).
The Tianhe-2 supercomputer will be reconstructed at its permanent home at the National Supercomputer Center in Guangzhou, China once the testing phase is finished. It will be one of the top supercomputers in the world once it is fully online! HPC Wire has a nice article with slides an further details on the upcoming processing powerhouse that is worth a read if you are into this sort of HPC stuff.
Also read: Cray unveils the TITAN supercomputer.
NVIDIA's plans for Tegra and Tesla
Subject: General Tech | April 24, 2013 - 01:38 PM | Jeremy Hellstrom
Tagged: Steve Scott, nvidia, HPC, tesla, logan, tegra
The Register had a chance to sit down with Steve Scott, once CTO of Cray and now CTO of NVIDIA's Tesla projects to discuss the future of their add-in cards as well as that of x86 in the server room. They discussed Tegra and why it is not receiving the same amount of attention at NVIDIA as Tegra is, as well as some of the fundamental differences in the chips both currently and going forward. NVIDIA plans to unite GPU and CPU onto both families of chips, likely with a custom interface as opposed to placing them on the same die, though both will continue to be designed for very different functions. A lot of the article focuses on Tegra, its memory bandwidth and most importantly its networking capabilities as it seems NVIDIA is focused on the server room and providing hundreds or thousands of interconnected Tegra processors to compete directly with x86 offerings. Read on for the full interview.
"Jen-Hsun Huang, co-founder and CEO of Nvidia has been perfectly honest about the fact that the graphics chip maker didn't intend to get into the supercomputing business. Rather, it was founded by a bunch of gamers who wanted better graphics cards to play 3D games. Fast forward two decades, though, and the Nvidia Tesla GPU coprocessor and the CUDA programming environment have taken the supercomputer world by storm."
Here is some more Tech News from around the web:
- AMD pins future growth to embedded marketplace @ The Register
- AMD announces new embedded G-series SoC @ DigiTimes
- TSMC captures almost 50 percent of foundry market thanks to 28nm demand @ The Inquirer
- $45 BeagleBone Black Keeps Eyes on the Pi's @ Linux.com
- BlackBerry OS 10.1 leaks its secret goo over all the web @ The Register
- Samsung MV900F Wi-Fi 16.3MP Digital Camera Review @ ModSynergy
- i’m Watch: A Smartwatch Review @ TechwareLabs
IDF: Intel Announces Upcoming Haswell and Ivy Bridge-E Xeon Processors
Subject: General Tech | April 10, 2013 - 04:14 PM | Tim Verry
Tagged: xeon-ex, xeon-ep, xeon, server, Intel, HPC, haswell
Intel officially announced its next-generation Xeon processors at IDF Beijing today. The new lineup includes the Haswell-based Xeon E3 1200 V3 family on the low end, and the Ivy Bridge-EP Xeon E5 and Ivy Bridge-EX Xeon E7 aimed at the mid-range general purpose and high-end HPC markets respectively. Intel did not disclose pricing or details on the new chips (such as core counts, cache, clockspeeds, number of SKUs etc.). However, the x86 chip giant did state that the new chips are coming later this year as well as teasing a few tidbits of information on the new Xeon chips.
The upcoming Xeon E3 processors will be part of the Xeon E3 1200 V3 family. These chips will be based on Haswell and are limited to one socket per board. Thanks to the Haswell architecture, Intel has managed to reduce power consumption by approximately 25% and increase video transcoding performance by about 25%. There will be at least one Xeon E3 1200 V3 series chip with a 13W TDP, for example.
Intel is also releasing a new media software development kit (SDK) for Linux and Windows machines that will provide a common platform for developers. It has allowed Intel to maximize the use of both the CPU and GPU for HD video transcoding as well as increasing the number of simultaneous video transcodes over previous generations. The new Xeon E3 1200 V3 (Haswell) chips will be available sometime before the end of 2013.
The next-generation Xeon E5 chips will be based on the 22nm Ivy Bridge-EP architecture. They will be positioned at general purpose computing in data centers (and possibly high-end workstations), and will be limited to 2 sockets per motherboard. The new Xeon E5 processors will incorporate Intel Secure Key and OS Guard technologies. OS Guard is the evolution of the company's existing Intel Execute Disable Bit security technology. Intel is also including AES-NI (AES-New Instructions), to improve the hardware acceleration of AES encrypt/decrypt operations. These mid-range Xeon chips will be available in Q3 2013.
Finally, the top-end Xeon E7 processors will be based on the 22nm Ivy Bridge-EX architecture. The upcoming processors are intended for high performance server and supercomputing applications where scalability and performance are important. The Ivy Bride-EX chips are compatible with motherboards that will have between 4 and 8 sockets and up to 12TB of RAM per node. Further, Intel has packed these processors with new RAS features, including Resilient System Technology and Resilient Memory Technology. The RAS features ensure stability and data integrity in calculations are maintained. Such features are important in scientific, real-time analytics, cloud computing, and banking applications, where performance and up-time are paramount and any errors could cost a company money. Intel has stated that the new Xeon E7 CPUs will be available in the fourth quarter of this year (Q4'13).
While I was hoping for more details as far as core count, clockspeeds, and pricing, the approximate release to market timeframe for the chips is known. Do you think you will be upgrading to the new Xeon chips later this year, or are your current processors fast enough for your server applications?
More information on the upcoming Xeon chips can be found in this Intel fact sheet (PDF).
GTC 2013: TYAN Launches New HPC Servers Powered by Kepler-based Tesla Cards
Subject: General Tech, Graphics Cards | March 19, 2013 - 06:52 PM | Tim Verry
Tagged: GTC 2013, tyan, HPC, servers, tesla, kepler, nvidia
Server platform manufacturer TYAN is showing off several of its latest servers aimed at the high performance computing (HPC) market. The new servers range in size from 2U to 4U chassis and hold up to 8 Kepler-based Tesla accelerator cards. The new product lineup consists of two motherboards and three bare-bones systems. The S7055 and S7056 are the motherboards while the FT77-B7059, TA77-B7061, and FT48-B7055.
The TA77-B7061 is the smallest system, with support for two Intel Xeon E5-2600 processors and four Kepler-based Tesla accelerator cards. The FT48-B7055 has si7056 specifications but is housed in a 4U chassis. Finally, the FT77-B7059 is a 4U system with support for two Intel Xeon E5-2600 processors, and up to eight Tesla accelerator cards. The S7055 supports a maximum of 4 GPUs while the S7056 can support two Tesla cards, though these are bare boards so you will have to supply your own cards, processors, and RAM (of course).
According to TYAN, the new Kepler-based HPC systems will be available in Q2 2013, though there is no word on pricing yet.
Stay tuned to PC Perspective for further GTC 2013 Coverage!
Too good to be true; bad coding versus GPGPU compute power
Subject: General Tech | November 23, 2012 - 01:03 PM | Jeremy Hellstrom
Tagged: gpgpu, amd, nvidia, Intel, phi, tesla, firepro, HPC
The skeptics were right to question the huge improvements seen when using GPGPUs in a system for heavy parallel computing tasks. The cards do help a lot but the 100x improvements that have been reported by some companies and universities had more to do with poorly optimized CPU code than with the processing power of GPGPUs. This news comes from someone who you might not expect to burst this particular bubble, Sumit Gupta is the GM of NVIDIA's Tesla team and he might be trying to mitigate any possible disappointment from future customers which have optimized CPU coding and won't see the huge improvements seen by academics and other current customers. The Inquirer does point out a balancing benefit, it is obviously much easier to optimize code in CUDA, OpenCL and other GPGPU languages than it is to code for multicored CPUs.
"Both AMD and Nvidia have been using real-world code examples and projects to promote the performance of their respective GPGPU accelerators for years, but now it seems some of the eye popping figures including speed ups of 100x or 200x were not down to just the computing power of GPGPUs. Sumit Gupta, GM of Nvidia's Tesla business told The INQUIRER that such figures were generally down to starting with unoptimised CPU."
Here is some more Tech News from around the web:
- Intel reportedly speeds up development of low-power processors @ DigiTimes
- Firefox and Opera squish big buffer overflow bugs @ The Register
- Hexing MAC address reveals Wifi passwords @ The Register
- Cisco Linksys EA6500 Smart Wi-Fi Router Review @ Legit Reviews
- Camera shootout: Samsung Galaxy S III vs S III mini @ Hardware.info
- Black Friday Tech Deals @ TechReviewSource
- Lawrence 'Empire Strikes Back' Kasdan to pen future Star Wars script @ The Register
- Win Corsair AX860i, AX760i, AX860 & AX760 power supplies @ Kitguru
AMD Launches Dual Tahiti FirePro S10000 Graphics Card
Subject: Graphics Cards | November 13, 2012 - 04:15 PM | Tim Verry
Tagged: tahiti, HPC, gpgpu, firepro s10000, firepro
On Monday, AMD launched its latest graphics card aimed at the server and workstation market. Called the AMD FirePro S10000 (for clarity, that’s FirePro S10,000), it is a dual GPU Tahiti graphics card that offers up some impressive performance numbers.
No, unfortunately, this is not the (at this point) mythical dual-7970 AMD HD 7990 graphics card. Rather, the FirePro S10,000 is essentially two Radeon 7950 GPUs on a single PCB along with 6 GB of GDDR5 memory. Specifications on the card include 3,584 stream processors, a GPU clock speed of 825 MHz, and 6 GB GDDR5 with a total of 480 GB/s of memory bandwidth. That is 1,792 stream processors and 3 GB of memory per GPU. Interestingly, this is a dual slot card with an active cooler. At 375W, a passive cooler is just not possible in a form factor necessary to fit into a server rack. Therefore, AMD has equipped the FirePro S10,000 GPGPU card with a triple fan cooler reminiscant of the setup PowerColor uses on its custom (2x7970) Devil 13, but not as large. The FirePro card has three red fans (shrouded by a black cover) over a heatpipe and aluminum fin heatsink. The card does include display outputs for workstation uses including one DVI and four mini DisplayPort ports.
AMD is claiming 1.48 TFLOPS in double precision work and 5.91 TFLOPS in single precision workloads. Those are impressive numbers, and the card even manages to beat NVIDIA’s new Tesla K20X with big Kepler GK110 and the company’s dual GPU GK104 Tesla K10 by notable margins. Additionally, the new FirePro S10000 manages to beat its FirePro 9000 predecessor handily. The S9000 in comparison is rated at 0.806 TFLOPS for double precision calculations and 3.23 TFLOPS on single precision work. The S9000 is a single GPU card equivalent to the Radeon 7950 on the consumer side of things with 1,792 shader cores. AMD has essentially taken two S9000 cards and put them on a single PCB, and managed to get almost twice the potential performance without needing twice the power.
Efficiency and calculations per watt were numbers that AMD did not dive too much into, but the company did share that the new FirePro S10000 achieves 3.94 GLOPS/W. AMD compares this to NVIDIA’s dual GPU (Fermi-based) Tesla M2090 at 2.96 GFLOPS/W. Unfortunately, NVIDIA has not shared a single GPU GFLOPS/W rating on its new K20X cards.
| AMD S10000 | AMD S9000 | NVIDIA K20X | NVIDIA K10 | |
| Double Precision | 1.48 TF | 0.806 TF | 1.31 TF | 0.19 TF |
| Single Precision | 5.91 TF | 3.23 TF | 3.95 TF | 4.58 TF |
| Architecture | Tahiti (x2) | Tahiti (x1) | GK110 | GK104 (x2) |
| TDP | 375W | 225W | 235W | 225W |
| Memory Bandwidth | 480 GB/s | 264 GB/s | 250 GB/s | 320 GB/s |
| Memory Capacity | 6 GB | 6 GB | 6 GB | 8 GB |
| Stream Processors | 3,584 | 1,792 | 2,688 | 3,070 |
| Core clock speed | 825 MHz | 900 MHz | 732 MHz | 745 M |
| MSRP | $3,599 | $2,499 | $3,199 | ~$2500 |
Other features of the AMD FirePro S10000 include support for OpenCL, Microsoft RemoteFX, Direct GPU pass-through, and (shared) virtualized graphics. AMD envisions businesses using these FirePro cards to provide GPU hardware acceleration for virtualized desktops and thin clients. With Xen Server, multiple users are able to tap into the hardware acceleration offered by the FirePro S10000 to speed up desktop and speed up programs that support it.
Operating systems in particular have begun tapping into GPU acceleration to speed up the user interface and run things like the Aero desktop in Windows 7. High end software for workstations also have a high GPU acceleration adoption rate, so there are benefits to be had, and AMD is continuing to offer it with its latest FirePro card.
AMD is offering up a card that can be used for a mix of compute or graphics output, making them an interesting choice for workstations. The FirePro S10000’s major fault lies with a 375W TDP, and while the peak performance is respectable it is going to use more power while provided that compute muscle.
The cards are available now with an MSRP of $3,599. It is neat to finally see AMD come out with a dual GPU card with Tahiti chips, and it will be interesting to see what kind of design wins the company is able to get for its beastly FirePro S10000.
AMD's new FirePro S10000 sports two GPUs
Subject: General Tech, Graphics Cards | November 13, 2012 - 01:17 PM | Jeremy Hellstrom
Tagged: amd, Intel, firepro, firepro s10000, HPC, Xeon Phi, 3120A, 5110P, Knight's Corner
AMD's new Tahiti based FirePro S10000 sports a little more than just a GPU upgrade it sports two GPU updates as this is a dual GPU card. According to The Register it should run about $3,600 and need 375W to perform, numbers which make it a more efficient card than the S9000 even though it needs significantly more cash and power to run. It is a 2 slot card, a necessity in the server and workstation world and while it does not support CrossFire it does support EyeFinity with its DVI port and four Mini DisplayPorts.
The Register also got some news about Xeon Phi, Intel's answer to the HPC cards on offer from AMD and Intel. Knights Corner is the evolution of Larrabee into an actual product, in this case two 62 core cards though not all of the cores are active. The passively cooled 5110P has 60 cores running at 1.053GHz, while the 3120A has 57 cores clocked slightly higher at 1.1GHz and sports a fan. Both cards produce just over a teraflop of double precision floating point math, compared to the 1.48 teraflops offered by AMD's S10000 or the 1.3 offered by the Tesla K20x. Check out more on these coprocessors at The Register.
"With the FirePro S10000, not only is the GPU geared down to 825MHz, but the memory is similarly downshifted to 5GHz. The memory interface is 384-bit wide on each GPU, with two blocks of GDDR5 memory yielding a total of 6GB. (This could be a little skinny on the memory for some HPC workloads, given that the S9000 card has 6GB of memory for one Tahiti GPU.) Each GPU can access 240GB/sec of memory bandwidth linking to each 3GB chunk of GDDR5 memory.
Because the card is double-stuffed, it can deliver a very impressive 5.91 teraflops SP and 1.48 teraflops DP in peak floating point oomph."
Here is some more Tech News from around the web:
- The TR Podcast 123: Incremental improvements
- Microsoft Makes Direct X 11.1 a Windows 8 Exclusive @ Slashdot
- Random Linux Commands to Make Google Talk, Fix Wifi, Find Duplicate Files, and More @ Linux.com
- Microsoft Surface RT may only achieve 60% of forecasted sales @ DigiTimes
- Windows chief Steven Sinofsky leaves Microsoft @ The Inquirer
- Fedora 'Spherical Cow' delayed by bugs, Secure Boot @ The Register
- Microsoft rolls out always-on Skype for Windows Phone 8 @ The Register
- Gaming in Windows 8 vs Windows 7: what's the difference in performance? @ Hardware.info
- Windows 7 vs Windows 8 – The Definitive Performance Guide @ hardCOREware
- How to Change the Start Screen Background in Windows 8 @ TechSpot
- TP-Link TL-WDR3600 and WDR4300 review: two shades of black @ Hardware.info
- Win 1 El'Druin ARPG Gaming Mouse, 2 Hellion Gaming Mice and 1 Aegis Gaming Pad @ NikKTech
NVIDIA Launches Tesla K20X Accelerator Card, Powers Titan Supercomputer
Subject: General Tech | November 12, 2012 - 06:29 AM | Tim Verry
Tagged: tesla, supercomputer, nvidia, k20x, HPC, CUDA, computing
Graphics card manufacturer NVIDIA launched a new Tesla K20X accelerator card today that supplants the existing K20 as the top of the line model. The new card cranks up the double and single precision floating point performance, beefs up the memory capacity and bandwidth, and brings some efficiency improvements to the supercomputer space.
While it is not yet clear how many CUDA cores the K20X has, NVIDIA has stated that it is using the GK110 GPU, and is running with 6GB of memory with 250 GB/s of bandwidth – a nice improvement over the K20’s 5GB at 208 GB/s. Both the new K20X and K20 accelerator cards are based on the company’s Kepler architecture, but NVIDIA has managed to wring out more performance from the K20X. The K20 is rated at 1.17 TFlops peak double precision and 3.52 TFlops peak single precision while the K20X is rated at 1.31 TFlops and 3.95 TFlops.
The K20X manages to score 1.22 TFlops in DGEmm, which puts it at almost three times faster than the previous generation Tesla M2090 accelerator based on the Fermi architecture.
Aside from pure performance, NVIDIA is also touting efficiency gains with the new K20X accelerator card. When two K20X cards are paired with a 2P Sandy Bridge server, NVIDIA claims to achieve 76% efficiency versus 61% efficiency with a 2P Sandy Bridge server equipped with two previous generation M2090 accelerator cards. Additionally, NVIDIA claims to have enabled the Titan supercomputer to reach the #1 spot on the top 500 green supercomputers thanks to its new cards with a rating of 2,120.16 MFLOPS/W (million floating point operations per second per watt).
NVIDIA claims to have already shipped 30 PFLOPS worth of GPU accelerated computing power. Interestingly, most of that computing power is housed in the recently unveiled Titan supercomputer. This supercomputer contains 18,688 Tesla K20X (Kepler GK110) GPUs and 299,008 16-core AMD Opteron 6274 processors. It will consume 9 megawatts of power and is rated at a peak of 27 Petaflops and 17.59 Petaflops during a sustained Linpack benchmark. Further, when compared to Sandy Bridge processors, the K20 series offers up between 8.2 and 18.1 times more performance at several scientific applications.
While the Tesla cards undoubtedly use more power than CPUs, you need far fewer numbers of accelerator cards than processors to hit the same performance numbers. That is where NVIDIA is getting its power efficiency numbers from.
NVIDIA is aiming the accelerator cards at researchers and businesses doing 3D graphics, visual effects, high performance computing, climate modeling, molecular dynamics, earth science, simulations, fluid dynamics, and other such computationally intensive tasks. Using CUDA and the parrallel nature of the GPU, the Tesla cards can acheive performance much higher than a CPU-only system can. NVIDIA has also engineered software to better parrellelize workloads and keep the GPU accelerators fed with data that the company calls Hyper-Q and Dynamic Parallelism respectively.
It is interesting to see NVIDIA bring out a new flagship, especially another GK110 card. Systems using the K20 and the new K20X are available now with cards shipping this week and general availability later this month.
You can find the full press release below and a look at the GK110 GPU in our preview.
Anandtech also managed to get a look inside the Titan supercomputer at Oak Ridge National Labratory, where you can see the Tesla K20X cards in action.
NVIDIA's Tesla K10 offers serious single-precision performance
Subject: General Tech | June 19, 2012 - 03:04 PM | Jeremy Hellstrom
Tagged: nvidia, tesla, K10, GK104, HPC
One of NVIDIA 's line of Tesla HPC cards, the Tesla K10 has actually been seen in the wild. the new Tesla series is split between the GK104 based K10 model specifically designed for single-precision tasks and the GK110 based Tesla K20 and it is optimized for double-precision tasks. The K10 is capable of 4.58 teraflops thanks to a pair of GK104s with 8GB of GDDR5, whereas the K20 should in theory double Intel's Xeon Phi at 2 teraflops of double-precision performance but that has yet to be demonstrated. The K10 that was demonstrated also showed off another of the benefits of NVIDIA's new architecture, even with two GPUs the card remains within a 225W thermal envelop, something that is incredibly important if you are building a cluster. The Register has gathered together some of the benchmarks and slides from NVIDIA's release, which you can see here.
"The Top 500 supercomputer ranking is based on the performance of machines running the Linpack Fortran matrix math benchmark using double-precision floating point math, but a lot of applications will do just fine with single-precision math. And it is for these workloads, graphics chip maker and supercomputing upstart Nvidia says, that it designed the new Tesla K10 server coprocessors."
Here is some more Tech News from around the web:
- Samsung reportedly raising 4GB DDR3 module prices @ DigiTimes
- Can't watch Flash vids in Firefox? It's not just you @ The Register
- TSMC reportedly seeing orders slow down @ DigiTimes
- How To Unlock Huawei 3G Modems @ TechARP
- How to Clean and Fine-Tune a Tape Deck @ Hardware Secrets
- Interview with MEDION's Sandro Fabris @ HardwareHeaven
- HuntKey Joint Contest @ NikKTech
Intel Introduces Xeon Phi: Larrabee Unleashed
Subject: Processors | June 19, 2012 - 11:46 AM | Josh Walrath
Tagged: Xeon Phi, xeon e5, nvidia, larrabee, knights corner, Intel, HPC, gpgpu, amd
The one positive thing for Intel’s competitors is that it seems their enthusiasm for massively parallel computing is justified. Intel just entered that ring with a unique architecture that will certainly help push high performance computing more towards true heterogeneous computing.

















.png)
.png)
.png)



