Samsung and SK Hynix were in attendance at the Hot Chips Symposium in Cupertino, California to (among other things) talk about the future of High Bandwidth Memory (HBM). In fact, the companies are working on two new HBM products: HBM3 and an as-yet-unbranded "low cost HBM." HBM3 will replace HBM2 at the high end and is aimed at the HPC and "prosumer" markets while the low cost HBM technology lowers the barrier to entry and is intended to be used in mainstream consumer products.

As currently planned, HBM3 (Samsung refers to its implementation as Extreme HBM) features double the density per layer and at least double the bandwidth of the current HBM2 (which so far is only used in NVIDIA's planned Tesla P100). Specifically, the new memory technology offers up 16Gb (~2GB) per layer and as many as eight (or more) layers can be stacked together using TSVs into a single chip. So far we have seen GPUs use four HBM chips on a single package, and if that holds true with HBM3 and interposer size limits, we may well see future graphics cards with 64GB of memory! Considering the HBM2-based Tesla will have 16 and AMD's HBM-based Fury X cards had 4GB, HBM3 is a sizable jump!

Capacity is not the only benefit though. HBM3 doubles the bandwidth versus HBM2 with 512GB/s (or more) of peak bandwidth per stack! In the theoretical example of a graphics card with 64GB of HBM3 (four stacks), that would be in the range of 2 TB/s of theoretical maximum peak bandwidth! Real world may be less, but still that is many terabytes per second of bandwidth which is exciting because it opens a lot of possibilities for gaming especially as developers push graphics further towards photo realism and resolutions keep increasing. HBM3 should be plenty for awhile as far as keeping the GPU fed with data on the consumer and gaming side of things though I'm sure the HPC market will still crave more bandwidth.

Samsung further claims that HBM3 will operate at similar (~500MHz) clocks to HBM2, but will use "much less" core voltage (HBM2 is 1.2V).

Stacked HBM memory on an interposer surrounding a processor. Upcoming HBM technologies will allow memory stacks with double the number of layers.

HBM3 is perhaps the most interesting technologically; however, the "low cost HBM" is exciting in that it will enable HBM to be used in the systems and graphics cards most people purchase. There were less details available on this new lower cost variant, but Samsung did share a few specifics. The low cost HBM will offer up to 200GB/s per stack of peak bandwidth while being much cheaper to produce than current HBM2. In order to reduce the cost of production, their is no buffer die or ECC support and the number of Through Silicon Vias (TSV) connections have been reduced. In order to compensate for the lower number of TSVs, the pin speed has been increased to 3Gbps (versus 2Gbps on HBM2). Interestingly, Samsung would like for low cost HBM to support traditional silicon as well as potentially cheaper organic interposers. According to NVIDIA, TSV formation is the most expensive part of interposer fabrication, so making reductions there (and somewhat making up for it in increased per-connection speeds) makes sense when it comes to a cost-conscious product. It is unclear whether organic interposers will win out here, but it is nice to seem them get a mention and is an alternative worth looking into.

Both high bandwidth and low latency memory technologies are still years away and the designs are subject to change, but so far they are both plans are looking rather promising. I am intrigued by the possibilities and hope to see new products take advantage of the increased performance (and in the latter case lower cost). On the graphics front, HBM3 is way too far out to see a Vega release, but it may come just in time for AMD to incorporate it into its high end Navi GPUs, and by 2020 the battle between GDDR and HBM in the mainstream should be heating up.

What are your thoughts on the proposed HBM technologies?