Tokyo Tech Goes Green with KFC (NVIDIA and Efficiency)

Subject: General Tech, Graphics Cards, Systems | November 21, 2013 - 09:47 PM |
Tagged: nvidia, tesla, supercomputing

GPUs are very efficient in terms of operations per watt. Their architecture is best suited for a gigantic bundle of similar calculations (such as a set of operations for each entry of a large blob of data). These are the tasks which also take up the most computation time especially for, not surprisingly, 3D graphics (where you need to do something to every pixel, fragment, vertex, etc.). It is also very relevant for scientific calculations, financial and other "big data" services, weather prediction, and so forth.

nvidia-submerge.png

Tokyo Tech KFC achieves over 4 GigaFLOPs per watt of power draw from 160 Tesla K20X GPUs in its cluster. That is about 25% more calculations per watt than current leader of the Green500 (CINECA Eurora System in Italy, with 3.208 GFLOPs/W).

One interesting trait: this supercomputer will be cooled by oil immersion. NVIDIA offers passively cooled Tesla cards which, according to my understanding of how this works, suit very well to this fluid system. I am fairly certain that they remove all of the fans before dunking the servers (I figured they would be left on).

By the way, was it intentional to name computers dunked in giant vats of heat-conducting oil, "KFC"?

Intel has done a similar test, which we reported on last September, submerging numerous servers for over a year. Another benefit of being green is that you are not nearly as concerned about air conditioning.

NVIDIA is actually taking it to the practical market with another nice supercomputer win.

Other NVIDIA Supercomputing News:

Source: NVIDIA

The Titan's Overthrown. Tianhe-2 Supercomputer New #1

Subject: General Tech, Processors, Systems | June 26, 2013 - 10:27 PM |
Tagged: supercomputing, supercomputer, titan, Xeon Phi

The National Supercomputer Center in Guangzho, China, will host the the world's fastest supercomputer by the end of the year. The Tianhe-2, English: "Milky Way-2", is capable of nearly double the floating-point performance of Titan albeit with slightly less performance per watt. The Tianhe-2 was developed by China's National University of Defense Technology.

tianhe-2-jack-dongarra-pdf-600x0.jpg

Photo Credit: Top500.org

Comparing new fastest computer with the former, China's Milky Way-2 is able to achieve 33.8627 PetaFLOPs of calculations from 17.808 MW of electricity. The Titan, on the other hand, is able to crunch 17.590 PetaFLOPs with a draw of just 8.209 MW. As such, the new Milky Way-2 uses 12.7% more power per FLOP than Titan.

Titan is famously based on the Kepler GPU architecture from NVIDIA, coupled with several 16-core AMD Opteron server processors clocked at 2.2 GHz. This concept of using accelerated hardware carried over into the design of Tianhe-2, which is based around Intel's Xeon Phi coprocessor. If you include the simplified co-processor cores of the Xeon Phi, the new champion is the sum of 3.12 million x86 cores and 1024 terabytes of memory.

... but will it run Crysis?

... if someone gets around to emulating DirectX in software, it very well could.

Source: Top500

Intel Hopes For Exaflop Capable Supercomputers Within 10 Years

Subject: Systems | June 21, 2011 - 03:52 AM |
Tagged: supercomputing, mic, larrabee, knights corner, Intel

Silicon Graphics International and Intel recently announced plans to reach exascale levels of computational power within ten years. Exascale computing amounts to computers that are capable of delivering 1,000+ petaflops (One exaflop is 1000 petaflops) of computational horsepower to process quintillions of calculations. To put that in perspective, today’s supercomputers are just now breaking into the level of single-digit petaflop performance, with the fastest supercomputer delivering 8.16 petaflops. It is capable of this thanks to many thousands of eight core CPUs, whereas other top 500 supercomputers are starting to utilize a CPU and GPU combination in order to achieve petaflop performance.

The Aubrey Isle Silicon Inside Knights Corner

This partnering of Central Processing Unit (CPU) and GPU (or other accelerator) allows high performance supercomputers to achieve much higher performance than with CPUs alone. Intel CPUs power close to 80% of the top 500 Supercomputers; however, they have begun to realize that specialized accelerators are able to speed up highly parallel computing tasks. Specifically, Intel plans to combine Xeon processors with successors to their Knights Corner Many Integrated Core accelerator to reach exascale performance levels when combined with other data transfer and inter-core communication advancements. Knights Corner is an upcoming successor to the Knights Ferry and Larrabee processors.

Computer World quotes Eng Lim Goh, the CTO of SGI, in stating that “Accelerators such as graphics processors (GPUs) are currently being used with CPUs to execute more calculations per second. While some accelerators achieve desired results, many are not satisfied with the performance related to the time and cost spent porting applications to work with accelerators.”

Knights corner will be able to run x86 based software and features 50 cores based on a 22nm manufacturing process.  Each core will run four threads at 1.2 GHz, have 8 MB of cache, and will be supported by 512 bit vector processing units.  It’s predecessor, Knights Ferry is based on 32 45nm cores and eight contained in a Xeon server and are capable of 7.4 teraflops. Their MIC chip is aimed directly at NVIDIA’s CUDA and AMD’s OpenCL graphics processors, and is claimed to offer performance in addition to ease of use as they are capable of running traditional x86 based software.

It looks like the CPU-only supercomputers will be seeing more competition from GPU and MIC accelerated supercomputers, and will eventually be replaced at the exascale level. AMD and NVIDIA are betting heavily on their OpenCL and CUDA programmable graphics cards while Intel is going with a chip capable of running less specialized but widely used x86 programmable chips.  It remains to be seen which platform will be victorious; however, the increased competition should hasten the advancement of high performance computing power.  You can read more about Intel’s plan for Many Integrated Core accelerated supercomputing here.

Japanese Supercomputer Takes First Place Crown On Top 500 List

Subject: Systems | June 20, 2011 - 11:34 PM |
Tagged: supercomputing, petaflop

 Residing in the Riken Advanced Institute For Computational Science in Kobe, a Japanese supercomputer capable of 8.16 petaflops of computational power has reclaimed the number one supercomputer spot on the Top 500 list. The last time Japan held the number one spot was in 2004 with their Earth Simulator. Dubbed the K Computer, the new Japanese machine has handily widened the gap between the now second place Chinese Tianhe 1A, which delivers close to a fourth of the computational power at 2.57 petaflops.

 

The K Computer Setup at Riken AICS.

What makes the new supercomputer especially interesting is that it uses only CPUs to deliver all 8.16 petaflops, and eschews any graphics processors or other accelerators. Specifically, the K Computer is comprised of 68,544 eight core SPARC64 VIIIfx processors, which amounts to 548,352 processing cores. When the supercomputer enters service at the Riken AICS, it will be capable of even more performance. Specifically, it will deliver more than 10 petaflops using 80,000 of the eight core SPARC CPUs (640,000 cores).

One of the K Computer's racks.

Unfortunately, this top level computational power comes at a price, specifically the amount of power required to run the machine. While running the Linpack benchmark, the machine drew 10 megawatts of power, which is slightly more than twice the average power consumption of the other top 10 systems at 4.3 megawatts.

If the CPU-only design is capable of delivering greater than 10 petaflops once the K Computer is put into operation, it will be a very noteworthy feat. On the other hand, the climbing power requirements are an issue, and the competition is unlikely to surpass the K Computer without further breakthroughs in power-efficient processor and memory designs. Erich Strohmaier, the head of the Future Technology Group of the Computational Research Division at Lawrence Berkeley National Laboratory was quoted by Computer World as stating "Even if it is not desirable, we can adapt to 10 MW for the very largest systems, but we cannot allow power consumption to grow much more." You can read more about the new system over at Computer World.