Samsung and SK Hynix Discuss The Future of High Bandwidth Memory (HBM) At Hot Chips 28

Subject: Memory | August 25, 2016 - 02:39 AM |
Tagged: TSV, SK Hynix, Samsung, hot chips, hbm3, hbm

Samsung and SK Hynix were in attendance at the Hot Chips Symposium in Cupertino, California to (among other things) talk about the future of High Bandwidth Memory (HBM). In fact, the companies are working on two new HBM products: HBM3 and an as-yet-unbranded "low cost HBM." HBM3 will replace HBM2 at the high end and is aimed at the HPC and "prosumer" markets while the low cost HBM technology lowers the barrier to entry and is intended to be used in mainstream consumer products.

As currently planned, HBM3 (Samsung refers to its implementation as Extreme HBM) features double the density per layer and at least double the bandwidth of the current HBM2 (which so far is only used in NVIDIA's planned Tesla P100). Specifically, the new memory technology offers up 16Gb (~2GB) per layer and as many as eight (or more) layers can be stacked together using TSVs into a single chip. So far we have seen GPUs use four HBM chips on a single package, and if that holds true with HBM3 and interposer size limits, we may well see future graphics cards with 64GB of memory! Considering the HBM2-based Tesla will have 16 and AMD's HBM-based Fury X cards had 4GB, HBM3 is a sizable jump!

Capacity is not the only benefit though. HBM3 doubles the bandwidth versus HBM2 with 512GB/s (or more) of peak bandwidth per stack! In the theoretical example of a graphics card with 64GB of HBM3 (four stacks), that would be in the range of 2 TB/s of theoretical maximum peak bandwidth! Real world may be less, but still that is many terabytes per second of bandwidth which is exciting because it opens a lot of possibilities for gaming especially as developers push graphics further towards photo realism and resolutions keep increasing. HBM3 should be plenty for awhile as far as keeping the GPU fed with data on the consumer and gaming side of things though I'm sure the HPC market will still crave more bandwidth.

Samsung further claims that HBM3 will operate at similar (~500MHz) clocks to HBM2, but will use "much less" core voltage (HBM2 is 1.2V).

HBM Four Stacked.jpg

Stacked HBM memory on an interposer surrounding a processor. Upcoming HBM technologies will allow memory stacks with double the number of layers.

HBM3 is perhaps the most interesting technologically; however, the "low cost HBM" is exciting in that it will enable HBM to be used in the systems and graphics cards most people purchase. There were less details available on this new lower cost variant, but Samsung did share a few specifics. The low cost HBM will offer up to 200GB/s per stack of peak bandwidth while being much cheaper to produce than current HBM2. In order to reduce the cost of production, their is no buffer die or ECC support and the number of Through Silicon Vias (TSV) connections have been reduced. In order to compensate for the lower number of TSVs, the pin speed has been increased to 3Gbps (versus 2Gbps on HBM2). Interestingly, Samsung would like for low cost HBM to support traditional silicon as well as potentially cheaper organic interposers. According to NVIDIA, TSV formation is the most expensive part of interposer fabrication, so making reductions there (and somewhat making up for it in increased per-connection speeds) makes sense when it comes to a cost-conscious product. It is unclear whether organic interposers will win out here, but it is nice to seem them get a mention and is an alternative worth looking into.

Both high bandwidth and low latency memory technologies are still years away and the designs are subject to change, but so far they are both plans are looking rather promising. I am intrigued by the possibilities and hope to see new products take advantage of the increased performance (and in the latter case lower cost). On the graphics front, HBM3 is way too far out to see a Vega release, but it may come just in time for AMD to incorporate it into its high end Navi GPUs, and by 2020 the battle between GDDR and HBM in the mainstream should be heating up.

What are your thoughts on the proposed HBM technologies?

Source: Ars Technica

Another look at the OCZ RD400 NVMe SSD

Subject: Storage | May 27, 2016 - 02:42 PM |
Tagged: TSV, toshiba, ssd, revodrive, RD400, pcie, ocz, NVMe, M.2, HHHL, 512GB, 2280, 15nm

If you somehow felt that there was a test that Al missed while reviewing the OCZ RD400 NVMe SSD, then you have a chance for a second look.  There are several benchmarks which The SSD Review ran which were not covered and they have a different way of displaying data such as latency but the end results are the same, this drive is up there with the Samsung 950 Pro and Intel 750 Series.  Read all about it here.

OCZ-RD400-SSD-Front-Side.jpg

"With specs that rival the Samsung 950 Pro, a capacity point that nips at the heels of the Intel 750's largest model, and competitive MSRPs, the OCZ RD400 is out for blood. Read on to learn more about this latest enthusiast class NVMe SSD and see how it competes with the best of the best!"

Here are some more Storage reviews from around the web:

Storage

Subject: Storage
Manufacturer: Toshiba (OCZ)

Introduction, Specifications and Packaging

Introduction:

The OCZ RevoDrive has been around for a good long while. We looked at the first ever RevoDrive back in 2010. It was a bold move for the time, as PCIe SSDs were both rare and very expensive at that time. OCZ's innovation was to implement a new VCA RAID controller which kept latencies low and properly scaled with increased Queue Depth. OCZ got a lot of use out of this formula, later expanding to the RevoDrive 3 x2 which expanded to four parallel SSDs, all the way to the enterprise Z-Drive R4 which further expanded that out to eight RAIDed SSDs.

110911-140303-5.5.jpg

OCZ's RevoDrive lineup circa 2011.

The latter was a monster of an SSD both in physical size and storage capacity. Its performance was also impressive given that it launched five years ago. After being acquired by Toshiba, OCZ re-spun the old VCA-driven SSD one last time in the form of a RevoDrive 350, but it was the same old formula and high-latency SandForce controllers (updated with in-house Toshiba flash). The RevoDrive line needed to ditch that dated tech and move into the world of NVMe, and today it has!

DSC00772.jpg

Here is the new 'Toshiba OCZ RD400', branded as such under the recent rebadging that took place on OCZ's site. The Trion 150 and Vertex 180 have also been relabeled as TR150 and VT180. This new RD400 has some significant changes over the previous iterations of that line. The big one is that it is now a lean M.2 part which can come on/with an optional adapter card for those not having an available M.2 slot.

Read on for our full review of the new OCZ RD400!

Some dense reading for your morning about N3XT and nanotube based processors

Subject: General Tech | December 17, 2015 - 12:35 PM |
Tagged: N3XT, nanotubes, TSV

The achillies heel of processing density is heat and how to radiate it away from the parts doing the work, which is why processors and memory tend to be very flat.  This has change, we have begun to see 3D VNAND become common on the marketplace thanks to reduced heat generation and a variety of arcane tricks some of which Al explained last year.  Processors offer a more significant challenge, the TDP is much larger than that of flash and hotspots are more common and have a much more drastic effect on performance.  They can also be more difficult to fabricate; there is quite a trick to baking the interior of the chip without overcooking the external layers

Stanford University is working on what they call Nano-Engineered Computing Systems Technology, or N3XT which is working on Through Silicon Vias for processors. If successful this would allow a similar structure to current 3D VNAND on a processor which would vastly increase processing density.  The lower temperatures required to fab carbon nanotube transistors may just be what the industry has needed.  Make sure your brain is turned on and read on at The Inquirer.

carbon-nanotubes-n3xt-tech-540x334.jpeg

"One way in which Stanford University is exploring this is by using carbon nanotube technology in high-rise chip architecture processes. Working alongside other universities, Stanford engineers have created this new technology, which it calls Nano-Engineered Computing Systems Technology, or N3XT."

Here is some more Tech News from around the web:

Tech Talk

 

Source: The Inquirer

Samsung Announces Mass Production of 128GB DDR4 Sticks

Subject: Memory | November 26, 2015 - 05:23 PM |
Tagged: TSV, Samsung, enterprise, ddr4

You may remember Allyn's article about TSV memory back from IDF 2014. Through this process, Samsung and others are able to stack dies of memory onto a single package, which can increase density and bandwidth. This is done by punching holes through the dies and connecting them down to the PCB. The first analogy that comes to mind is an elevator shaft, but I'm not sure how accurate that is.

tsv-side-on.JPG

Anyway, Samsung has been applying it to enterprise-class DDR4 memory, which leads to impressive capacities. 64GB sticks, individual sticks, were introduced in 2014. This year, that capacity doubles to 128GB. The chips are fabricated at 20nm and each contain 8Gb (1GB) per layer. Each stick contains 36 packages of four chips.

At the end of their press release, Samsung also mentioned that they intend to expand their TSV technology into “HBM and consumer products.”

Source: Samsung

Podcast #318 - GTX 980 and R9 390X Rumors, Storage News from IDF, ADATA SP610 SSDs and more!

Subject: General Tech | September 18, 2014 - 01:59 PM |
Tagged: windows 9, video, TSV, supernova, raptr, r9 390x, podcast, p3700, nvidia, Intel, idf, GTX 980, evga, ECS, ddr4, amd

PC Perspective Podcast #318 - 09/18/2014

Join us this week as we discuss GTX 980 and R9 390X Rumors, Storage News from IDF, ADATA SP610 SSDs and more!

You can subscribe to us through iTunes and you can still access it directly through the RSS page HERE.

The URL for the podcast is: http://pcper.com/podcast - Share with your friends!

  • iTunes - Subscribe to the podcast directly through the iTunes Store
  • RSS - Subscribe through your regular RSS reader
  • MP3 - Direct download link to the MP3 file

Hosts: Ryan Shrout, Jeremy Hellstrom, Josh Walrath, and Allyn Malventano

Subscribe to the PC Perspective YouTube Channel for more videos, reviews and podcasts!!

 

IDF 2014: Through Silicon Via - Connecting memory dies without wires

Subject: Storage, Shows and Expos | September 10, 2014 - 03:34 PM |
Tagged: TSV, Through Silicon Via, memory, idf 2014, idf

If you're a general computer user, you might have never heard the term "Through Silicon Via". If you geek out on photos of chip dies and wafers, and how chips are assembled and packaged, you might have heard about it. Regardless of your current knowledge of TSV, it's about to be a thing that impacts all of you in the near future.

Let's go into a bit of background first. We're going to talk about how chips are packaged. Micron has an excellent video on the process here:

The part we are going to focus on appears at 1:31 in the above video:

die wiring.png

This is how chip dies are currently connected to the outside world. The dies are stacked (four high in the above pic) and a machine has to individually wire them to a substrate, which in turn communicates with the rest of the system. As you might imagine, things get more complex with this process as you stack more and more dies on top of each other:

chip stacking.png

16 layer die stack, pic courtesy NovaChips

...so we have these microchips with extremely small features, but to connect them we are limited to a relatively bulky process (called package-on-package). Stacking these flat planes of storage is a tricky thing to do, and one would naturally want to limit how many of those wires you need to connect. The catch is that those wires also equate to available throughput from the device (i.e. one wire per bit of a data bus). So, just how can we improve this method and increase data bus widths, throughput, etc?

Before I answer that, let me lead up to it by showing how flash memory has just taken a leap in performance. Samsung has recently made the jump to VNAND:

vnand crop--.png

By stacking flash memory cells vertically within a die, Samsung was able to make many advances in flash memory, simply because they had more room within each die. Because of the complexity of the process, they also had to revert back to an older (larger) feature size. That compromise meant that the capacity of each die is similar to current 2D NAND tech, but the bonus is speed, longevity, and power reduction advantages by using this new process.

I showed you the VNAND example because it bears a striking resemblance to what is now happening in the area of die stacking and packaging. Imagine if you could stack dies by punching holes straight through them and making the connections directly through the bottom of each die. As it turns out, that's actually a thing:

tsv cross section.png

Read on for more info about TSV!

That RAM is stacked

Subject: General Tech | November 28, 2013 - 01:48 PM |
Tagged: DRAM, HMC, hybrid memory cubes, micron, TSV

Hybrid Memory Cubes are DRAM stacked in layers with logic on the bottom layer to decide which memory layer to address commands to whic is being developed by a team that includes Altera, ARM, IBM, SK Hynix, Micron, Open-Silicon, Samsung and Xilinix.  This is intended to give DRAM enhanced parallelization which will help it keep up with today's multi-cored processors.  Micron's example which the Register takes a look at here claims up to 10 GB/sec (80 Gb/sec) of bandwidth from each of the 16 vaults present on the chip, a vault being an area of memory on a layer.  That compares favourably to the maximum theoretical JEDEC speed of DDR3-1333 which is just a hair over 10GB/s.  Read more here.

hmc_1.jpg

"Dratted multi-core CPUs. DRAM is running into a bandwidth problem. More powerful CPUs has meant that more cores are trying to access a server’s memory and the bandwidth is running out.

One solution is to stack DRAM in layers above a logic base layer and increase access speed to the resulting hybrid memory cubes (HMC), and Micron has done just that."

Here is some more Tech News from around the web:

Tech Talk

Source: The Register