Samsung Mass Producing Second Generation "Aquabolt" HBM2: Better, Faster, and Stronger

Subject: Memory | January 12, 2018 - 05:46 PM |
Tagged: supercomputing, Samsung, HPC, HBM2, graphics cards, aquabolt

Samsung recently announced that it has begun mass production of its second generation HBM2 memory which it is calling “Aquabolt”. Samsung has refined the design of its 8GB HBM2 packages allowing them to achieve an impressive 2.4 Gbps per pin data transfer rates without needing more power than its first generation 1.2V HBM2.


Reportedly Samsung is using new TSV (through-silicon-via) design techniques and adding additional thermal bumps between dies to improve clocks and thermal control. Each 8GB HBM2 “Aquabolt” package is comprised of eight 8Gb dies each of which is vertically interconnected using 5,000 TSVs which is a huge number especially considering how small and tightly packed these dies are. Further, Samsung has added a new protective layer at the bottom of the stack to reinforce the package’s physical strength. While the press release did not go into detail, it does mention that Samsung had to overcome challenges relating to “collateral clock skewing” as a result of the sheer number of TSVs.

On the performance front, Samsung claims that Aquabolt offers up a 50% increase in per package performance versus its first generation “Flarebolt” memory which ran at 1.6Gbps per pin and 1.2V. Interestingly, Aquabolt is also faster than Samsung’s 2.0Gbps per pin HBM2 product (which needed 1.35V) without needing additional power. Samsung also compares Aquabolt to GDDR5 stating that it offers 9.6-times the bandwidth with a single package of HBM2 at 307 GB/s and a GDDR5 chip at 32 GB/s. Thanks to the 2.4 Gbps per pin speed, Aquabolt offers 307 GB/s of bandwidth per package and with four packages products such as graphics cards can take advantage of 1.2 TB/s of bandwidth.

This second generation HBM2 memory is a decent step up in performance (with HBM hitting 128GB/s and first generation HBM2 hitting 256 GB/s per package and 512 GB/s and 1 TB/s with four packages respectively), but the interesting bit is that it is faster without needing more power. The increased bandwidth and data transfer speeds will be a boon to the HPC and supercomputing market and useful for working with massive databases, simulations, neural networks and AI training, and other “big data” tasks.

Aquabolt looks particularly promising for the mobile market though with future products succeeding the current mobile Vega GPU in Kaby Lake-G processors, Ryzen Mobile APUs, and eventually discrete Vega mobile graphics cards getting a nice performance boost (it’s likely too late for AMD to go with this new HBM2 on these specific products, but future refreshes or generations may be able to take advantage of it). I’m sure it will also see usage in the SoCs uses in Intel’s and NVIDIA’s driverless car projects as well.

Source: Samsung

Allied Control Showing Off Immersion Cooling at SC17

Subject: Cases and Cooling | November 20, 2017 - 10:09 PM |
Tagged: Supercomputing Conference, supercomputing, liquid cooling, immersion cooling, HPC, allied control, 3M

PC Gamer Hardware (formerly Maximum PC) spotted a cool immersion cooling system being shown off at the SuperComputing conference in Denver, Colorado earlier this month. Allied Control who was recently acquired by BitFury (popular for its Bitcoin mining ASICs) was at the show with a two phase immersion cooling system that takes advantage of 3M's Novec fluid and a water cooled condesor coil to submerge and cool high end and densely packed hardware with no moving parts and no pesky oil residue.

Allied Control Immersion Cooling.png

Nick Knupffer (@Nick_Knupffer) posted a video (embedded below) of the cooling system in action cooling a high end processor and five graphics cards. The components are submerged in a non-flamable, non-conductive fluid that has a very low boiling point of 41°C. Interestingly, the heatsinks and fans are removed allowing for direct contact between the fluid and the chips (in this case there is a copper baseplate on the CPU but bare ASICs can also be cooled). When the hardware is in use, heat is transfered to the liquid which begins to boil off from a liquid to a vapor / gaseous state. The vapor rises to the surface and hits a condensor coil (which can be water cooled) that cools the gas until it turns back into a liquid and falls back into the tank. The company has previously shown off an overclocked 20 GPU (250W) plus dual Xeon system that was able to run flat out (The GPUs at 120% TDP) running deep learning as well as mining Z-Cash when not working on HPC projects while keeping all the hardware well under thermal limits and not throttling. Cnet also spotted a 10 GPU system being shown off at Computex (warning autoplay video ad!).

According to 3M, two phase immersion cooling is extremely efficient (many times more than air or even water) and can enable up to 95% lower energy cooling costs versus conventional air cooling. Further, hardware can be packed much more tightly with up to 100kW/square meter versus 10kW/sq. m with air meaning immersion cooled hardware can take up to 10% less floor space and the heat produced can be reclaimed for datacenter building heating or other processes.



Neat stuff for sure even if it is still out of the range of home gaming PCs and mining rigs for now! Speaking of mining BitFury plans to cool a massive 40+ MW ASIC mining farm in the Republic of Georgia using an Allied Control designed immersion cooling system (see links below)!

Also read:

Source: PC Gamer

Tokyo Tech Goes Green with KFC (NVIDIA and Efficiency)

Subject: General Tech, Graphics Cards, Systems | November 21, 2013 - 09:47 PM |
Tagged: nvidia, tesla, supercomputing

GPUs are very efficient in terms of operations per watt. Their architecture is best suited for a gigantic bundle of similar calculations (such as a set of operations for each entry of a large blob of data). These are the tasks which also take up the most computation time especially for, not surprisingly, 3D graphics (where you need to do something to every pixel, fragment, vertex, etc.). It is also very relevant for scientific calculations, financial and other "big data" services, weather prediction, and so forth.


Tokyo Tech KFC achieves over 4 GigaFLOPs per watt of power draw from 160 Tesla K20X GPUs in its cluster. That is about 25% more calculations per watt than current leader of the Green500 (CINECA Eurora System in Italy, with 3.208 GFLOPs/W).

One interesting trait: this supercomputer will be cooled by oil immersion. NVIDIA offers passively cooled Tesla cards which, according to my understanding of how this works, suit very well to this fluid system. I am fairly certain that they remove all of the fans before dunking the servers (I figured they would be left on).

By the way, was it intentional to name computers dunked in giant vats of heat-conducting oil, "KFC"?

Intel has done a similar test, which we reported on last September, submerging numerous servers for over a year. Another benefit of being green is that you are not nearly as concerned about air conditioning.

NVIDIA is actually taking it to the practical market with another nice supercomputer win.

Other NVIDIA Supercomputing News:

Source: NVIDIA

The Titan's Overthrown. Tianhe-2 Supercomputer New #1

Subject: General Tech, Processors, Systems | June 26, 2013 - 10:27 PM |
Tagged: supercomputing, supercomputer, titan, Xeon Phi

The National Supercomputer Center in Guangzho, China, will host the the world's fastest supercomputer by the end of the year. The Tianhe-2, English: "Milky Way-2", is capable of nearly double the floating-point performance of Titan albeit with slightly less performance per watt. The Tianhe-2 was developed by China's National University of Defense Technology.


Photo Credit:

Comparing new fastest computer with the former, China's Milky Way-2 is able to achieve 33.8627 PetaFLOPs of calculations from 17.808 MW of electricity. The Titan, on the other hand, is able to crunch 17.590 PetaFLOPs with a draw of just 8.209 MW. As such, the new Milky Way-2 uses 12.7% more power per FLOP than Titan.

Titan is famously based on the Kepler GPU architecture from NVIDIA, coupled with several 16-core AMD Opteron server processors clocked at 2.2 GHz. This concept of using accelerated hardware carried over into the design of Tianhe-2, which is based around Intel's Xeon Phi coprocessor. If you include the simplified co-processor cores of the Xeon Phi, the new champion is the sum of 3.12 million x86 cores and 1024 terabytes of memory.

... but will it run Crysis?

... if someone gets around to emulating DirectX in software, it very well could.

Source: Top500

Intel Hopes For Exaflop Capable Supercomputers Within 10 Years

Subject: Systems | June 21, 2011 - 03:52 AM |
Tagged: supercomputing, mic, larrabee, knights corner, Intel

Silicon Graphics International and Intel recently announced plans to reach exascale levels of computational power within ten years. Exascale computing amounts to computers that are capable of delivering 1,000+ petaflops (One exaflop is 1000 petaflops) of computational horsepower to process quintillions of calculations. To put that in perspective, today’s supercomputers are just now breaking into the level of single-digit petaflop performance, with the fastest supercomputer delivering 8.16 petaflops. It is capable of this thanks to many thousands of eight core CPUs, whereas other top 500 supercomputers are starting to utilize a CPU and GPU combination in order to achieve petaflop performance.

The Aubrey Isle Silicon Inside Knights Corner

This partnering of Central Processing Unit (CPU) and GPU (or other accelerator) allows high performance supercomputers to achieve much higher performance than with CPUs alone. Intel CPUs power close to 80% of the top 500 Supercomputers; however, they have begun to realize that specialized accelerators are able to speed up highly parallel computing tasks. Specifically, Intel plans to combine Xeon processors with successors to their Knights Corner Many Integrated Core accelerator to reach exascale performance levels when combined with other data transfer and inter-core communication advancements. Knights Corner is an upcoming successor to the Knights Ferry and Larrabee processors.

Computer World quotes Eng Lim Goh, the CTO of SGI, in stating that “Accelerators such as graphics processors (GPUs) are currently being used with CPUs to execute more calculations per second. While some accelerators achieve desired results, many are not satisfied with the performance related to the time and cost spent porting applications to work with accelerators.”

Knights corner will be able to run x86 based software and features 50 cores based on a 22nm manufacturing process.  Each core will run four threads at 1.2 GHz, have 8 MB of cache, and will be supported by 512 bit vector processing units.  It’s predecessor, Knights Ferry is based on 32 45nm cores and eight contained in a Xeon server and are capable of 7.4 teraflops. Their MIC chip is aimed directly at NVIDIA’s CUDA and AMD’s OpenCL graphics processors, and is claimed to offer performance in addition to ease of use as they are capable of running traditional x86 based software.

It looks like the CPU-only supercomputers will be seeing more competition from GPU and MIC accelerated supercomputers, and will eventually be replaced at the exascale level. AMD and NVIDIA are betting heavily on their OpenCL and CUDA programmable graphics cards while Intel is going with a chip capable of running less specialized but widely used x86 programmable chips.  It remains to be seen which platform will be victorious; however, the increased competition should hasten the advancement of high performance computing power.  You can read more about Intel’s plan for Many Integrated Core accelerated supercomputing here.

Japanese Supercomputer Takes First Place Crown On Top 500 List

Subject: Systems | June 20, 2011 - 11:34 PM |
Tagged: supercomputing, petaflop

 Residing in the Riken Advanced Institute For Computational Science in Kobe, a Japanese supercomputer capable of 8.16 petaflops of computational power has reclaimed the number one supercomputer spot on the Top 500 list. The last time Japan held the number one spot was in 2004 with their Earth Simulator. Dubbed the K Computer, the new Japanese machine has handily widened the gap between the now second place Chinese Tianhe 1A, which delivers close to a fourth of the computational power at 2.57 petaflops.


The K Computer Setup at Riken AICS.

What makes the new supercomputer especially interesting is that it uses only CPUs to deliver all 8.16 petaflops, and eschews any graphics processors or other accelerators. Specifically, the K Computer is comprised of 68,544 eight core SPARC64 VIIIfx processors, which amounts to 548,352 processing cores. When the supercomputer enters service at the Riken AICS, it will be capable of even more performance. Specifically, it will deliver more than 10 petaflops using 80,000 of the eight core SPARC CPUs (640,000 cores).

One of the K Computer's racks.

Unfortunately, this top level computational power comes at a price, specifically the amount of power required to run the machine. While running the Linpack benchmark, the machine drew 10 megawatts of power, which is slightly more than twice the average power consumption of the other top 10 systems at 4.3 megawatts.

If the CPU-only design is capable of delivering greater than 10 petaflops once the K Computer is put into operation, it will be a very noteworthy feat. On the other hand, the climbing power requirements are an issue, and the competition is unlikely to surpass the K Computer without further breakthroughs in power-efficient processor and memory designs. Erich Strohmaier, the head of the Future Technology Group of the Computational Research Division at Lawrence Berkeley National Laboratory was quoted by Computer World as stating "Even if it is not desirable, we can adapt to 10 MW for the very largest systems, but we cannot allow power consumption to grow much more." You can read more about the new system over at Computer World.