Introduction, How PCM Works, Reading, Writing, and Tweaks
I’ve seen a bit of flawed logic floating around related to discussions about 3D XPoint technology. Some are directly comparing the cost per die to NAND flash (you can’t - 3D XPoint likely has fewer fab steps than NAND - especially when compared with 3D NAND). Others are repeating a bunch of terminology and element names without taking the time to actually explain how it works, and far too many folks out there can't even pronounce it correctly (it's spoken 'cross-point'). My plan is to address as much of the confusion as I can with this article, and I hope you walk away understanding how XPoint and its underlying technologies (most likely) work. While we do not have absolute confirmation of the precise material compositions, there is a significant amount of evidence pointing to one particular set of technologies. With Optane Memory now out in the wild and purchasable by folks wielding electron microscopes and mass spectrometers, I have seen enough additional information come across to assume XPoint is, in fact, PCM based.
XPoint memory. Note the shape of the cell/selector structure. This will be significant later.
While we were initially told at the XPoint announcement event Q&A that the technology was not phase change based, there is overwhelming evidence to the contrary, and it is likely that Intel did not want to let the cat out of the bag too early. The funny thing about that is that both Intel and Micron were briefing on PCM-based memory developments five years earlier, and nearly everything about those briefings lines up perfectly with what appears to have ended up in the XPoint that we have today.
Some die-level performance characteristics of various memory types. source
The above figures were sourced from a 2011 paper and may be a bit dated, but they do a good job putting some actual numbers with the die-level performance of the various solid state memory technologies. We can also see where the ~1000x speed and ~1000x endurance comparisons with XPoint to NAND Flash came from. Now, of course, those performance characteristics do not directly translate to the performance of a complete SSD package containing those dies. Controller overhead and management must take their respective cuts, as is shown with the performance of the first generation XPoint SSD we saw come out of Intel:
The ‘bridging the gap’ Latency Percentile graph from our Intel SSD DC P4800X review.
(The P4800X comes in at 10us above).
There have been a few very vocal folks out there chanting 'not good enough', without the basic understanding that the first publicly available iteration of a new technology never represents its ultimate performance capabilities. It took NAND flash decades to make it into usable SSDs, and another decade before climbing to the performance levels we enjoy today. Time will tell if this holds true for XPoint, but given Micron's demos and our own observed performance of Intel's P4800X and Optane Memory SSDs, I'd argue that it is most certainly off to a good start!
A 3D XPoint die, submitted for your viewing pleasure (click for larger version).
Subject: Storage | August 11, 2016 - 12:06 PM | Allyn Malventano
Tagged: FMS, FMS 2016, XPoint, micron, QuantX, nand, ram
Earlier this week, Micron launched their QuantX branding for XPoint devices, as well as giving us some good detail on expected IOPS performance of solutions containing these new parts:
Thanks to the very low latency of XPoint, the QuantX solution sees very high IOPS performance at a very low queue depth, and the random performance very quickly scales to fully saturate PCIe 3.0 x4 with only four queued commands. Micron's own 9100 MAX SSD (reviewed here), requires QD=256 (64x increase) just to come close to this level of performance! At that same presentation, a PCIe 3.0 x8 QuantX device was able to double that throughput at QD=8, but what are these things going to look like?
The real answer is just like modern day SSDs, but for the time being, we have the prototype unit pictured above. This is essentially an FPGA development board that Micron is using to prototype potential controller designs. Dedicated ASICs based on the final designs may be faster, but those take a while to ramp up volume production.
So there it is, in the flesh, nicely packaged and installed on a complete SSD. Sure it's a prototype, but Intel has promised we will see XPoint before the end of the year, and I'm excited to see this NAND-to-DRAM performance-gap-filling tech come to the masses!
Subject: Storage | August 9, 2016 - 03:33 PM | Allyn Malventano
Tagged: XPoint, QuantX, nand, micron
Micron just completed their keynote address at Flash Memory Summit, and as part of the presentation, we saw our first look at some raw scaled Queue Depth IOPS performance figures from devices utilizing XPoint memory:
These are the performance figures from an U.2 device with a PCIe 3.0 x4 link. Note the outstanding ramp up to full saturation of the bus at a QD of only 4. Slower flash devices require much more parallelism and a deeper queue to achieve sufficient IOPS throughput to saturate that same bus. That 'slow' device on the bottom there, I'm pretty certain, is Micron's own 9100 MAX, which was the fastest thing we had tested to date, and it's being just walked all over by this new XPoint prototype!
Ok, so that's damn fast, but what if you had an add in card with PCIe 3.0 x8?
Ok, now that's just insane! While the queue had to climb to ~8 to reach these figures, that's 1.8 MILLION IOPS from a single HHHL add in card. That's greater than 7 GB/s worth of 4KB random performance!
In addition to the crazy throughput and IOPS figures, we also see latencies running at 1/10th that of flash-based NVMe devices.
..so it appears that while the cell-level performance of XPoint boasts 1000x improvements over flash, once you implement it into an actual solution that must operate within the bounds of current systems (NVMe and PCIe 3.0), we currently get only a 10x improvement over NAND flash. Given how fast NAND already is, 10x is no small improvement, and XPoint still opens the door for further improvement as the technology and implementations mature over time.
More to follow as FMS continues!
Subject: Storage | August 9, 2016 - 01:09 PM | Allyn Malventano
Tagged: XPoint, UFS, QuantX, micron, FMS 2016, FMS
As you can see, UFS is not just for SD cards. These are going to be able to replace embedded memory in mobile devices, displacing the horror that is eMMC with something way faster. These devices are smaller than a penny, with a die size of just over 60 mm squared and boast a 32GB capacity.
One version of the UFS 2.1 devices also contains Micron's first packaged offering of LPDDR4X. This low power RAM offers an additional 20% power savings over existing LPDDR4.
Also up is an overdue branding of Micron's XPoint (spoken 'cross-point') products:
More to follow from FMS 2016. A few little birdies told me there will be some good stuff presented this morning (PST), so keep an eye out, folks!
Press blast for Micron's UFS goodness appears after the break.