Subject: Storage | March 27, 2017 - 12:16 PM | Allyn Malventano
Tagged: XPoint, Optane Memory, Optane, M.2, Intel, cache, 3D XPoint
We are just about to hit two years since Intel and Micron jointly launched 3D XPoint, and there have certainly been a lot of stories about it since. Intel officially launched the P4800X last week, and this week they are officially launching Optane Memory. The base level information about Optane Memory is mostly unchanged, however, we do have a slide deck we are allowed to pick from to point out some of the things we can look forward to once the new tech starts hitting devices you can own.
Alright, so this is Optane Memory in a nutshell. Put some XPoint memory on an M.2 form factor device, leverage Intel's SRT caching tech, and you get a 16GB or 32GB cache laid over your system's primary HDD.
To help explain what good Optane can do for typical desktop workloads, first we need to dig into Queue Depths a bit. Above are some examples of the typical QD various desktop applications run at. This data is from direct IO trace captures of systems in actual use. Now that we've established that the majority of desktop workloads operate at very low Queue Depths (<= 4), lets see where Optane performance falls relative to other storage technologies:
There's a bit to digest in this chart, but let me walk you through it. The ranges tapering off show the percentage of IOs falling at the various Queue Depths, while the green, red, and orange lines ramping up to higher IOPS (right axis) show relative SSD performance at those same Queue Depths. The key to Optane's performance benefit here is that it can ramp up to full performance at very low QD's, while the other NAND-based parts require significantly higher parallel requests to achieve full rated performance. This is what will ultimately lead to a much snappier responsiveness for, well, just about anything hitting the storage. Fun fact - there is actually a HDD on that chart. It's the yellow line that you might have mistook as the horizontal axis :).
As you can see, we have a few integrators on board already. Official support requires a 270 series motherboard and Kaby Lake CPU, but it is possible that motherboard makers could backport the required NVMe v1.1 and Intel RST 15.5 requirements into older systems.
For those curious, if caching is the only way power users will be able to go with Optane, that's not the case. Atop that pyramid there sits an 'Intel Optane SSD', which should basically be a consumer version of the P4800X. It is sure to be an incredibly fast SSD, but that performance will most definitely come at a price!
We should be testing Optane Memory shortly and will finally have some publishable results of this new tech as soon as we can!
Subject: Storage | March 19, 2017 - 12:21 PM | Allyn Malventano
Tagged: XPoint, SSD DC P4800X, Optane Memory, Optane, Intel, client, 750GB, 3D XPoint, 375GB, 1.5TB
Intel brought us out to their Folsom campus last week for some in-depth product briefings. Much of our briefing is still under embargo, but the portion that officially lifts this morning is the SSD DC P4800X:
MSRP for the 375GB model is estimated at $1520 ($4/GB), which is rather spendy, but given that the product has shown it can effectively displace RAM in servers, we should be comparing the cost/GB with DRAM and not NAND. It should also be noted this is also nearly half the cost/GB of the X25-M at its launch. Capacities will go all the way up to 1.5TB, and U.2 form factor versions are also on the way.
For those wanting a bit more technical info, the P4800X uses a 7-channel controller, with the 375GB model having 4 dies per channel (28 total). Overprovisioning does not do for Optane what it did for NAND flash, as XPoint can be rewritten at the byte level and does not need to be programmed in (KB) pages and erased in larger (MB) blocks. The only extra space on Optane SSDs is for ECC, firmware, and a small spare area to map out any failed cells.
Those with a keen eye (and calculator) might have noted that the early TBW values only put the P4800X at 30 DWPD for a 3-year period. At the event, Intel confirmed that they anticipate the P4800X to qualify at that same 30 DWPD for a 5-year period by the time volume shipment occurs.
Subject: Storage | February 15, 2017 - 08:58 PM | Allyn Malventano
Tagged: XPoint, ssd, Optane, memory, Intel, cache
We now have an actual Optane landing page on the Intel site that discusses the first iteration of 'Intel Optane Memory', which appears to be the 8000p Series that we covered last October and saw as an option on some upcoming Lenovo laptops. The site does not cover the upcoming enterprise parts like the 375GB P4800X, but instead, focuses on the far smaller 16GB and 32GB 'System Accelerator' M.2 modules.
Despite using only two lanes of PCIe 3.0, these modules turn in some impressive performance, but the capacities when using only one or two (16GB each) XPoint dies preclude an OS install. Instead, these will be used, presumably in combination with a newer form of Intel's Rapid Storage Technology driver, as a caching layer meant as an HDD accelerator:
While the random write performance and endurance of these parts blow any NAND-based SSD out of the water, the 2-lane bottleneck holds them back compared to high-end NVMe NAND SSDs, so we will likely see this first consumer iteration of Intel Optane Memory in OEM systems equipped with hard disks as their primary storage. A very quick 32GB caching layer should help speed things up considerably for the majority of typical buyers of these types of mobile and desktop systems, while still keeping the total cost below that for a decent capacity NAND SSD as primary storage. Hey, if you can't get every vendor to switch to pure SSD, at least you can speed up that spinning rust a bit, right?
Subject: Storage | February 10, 2017 - 04:22 PM | Allyn Malventano
Tagged: Optane, XPoint, P4800X, 375GB
Over the past few hours, we have seen another Intel Optane SSD leak rise to the surface. While we previously saw a roadmap and specs for a mobile storage accelerator platform, this time we have some specs for an enterprise part:
The specs are certainly impressive. While they don't match the maximum theoretical figures we heard at the initial XPoint announcement, we do see an endurance rating of 30 DWPD (drive writes per day), which is impressive given competing NAND products typically run in the single digits for that same metric. The 12.3 PetaBytes Written (PBW) rating is even more impressive given the capacity point that rating is based on is only 375GB (compare with 2000+ GB of enterprise parts that still do not match that figure).
Now I could rattle off the rest of the performance figures, but those are just numbers, and fortunately we have ways of showing these specs in a more practical manner:
Assuming the P4800X at least meets its stated specifications (very likely given Intel's track record there), and also with the understanding that XPoint products typically reach their maximum IOPS at Queue Depths far below 16, we can compare the theoretical figures for this new Optane part to the measured results from the two most recent NAND-based enterprise launches. To say the random performance makes leaves those parts in the dust is an understatement. 500,000+ IOPS is one thing, but doing so at lower QD's (where actual real-world enterprise usage actually sits) just makes this more of an embarrassment to NAND parts. The added latency of NAND translates to far higher/impractical QD's (256+) to reach their maximum ratings.
Intel research on typical Queue Depths seen in various enterprise workloads. Note that a lower latency device running the same workload will further 'shallow the queue', meaning even lower QD.
Another big deal in the enterprise is QoS. High IOPS and low latency are great, but where the rubber meets the road here is consistency. Enterprise tests measure this in varying degrees of "9's", which exponentially approach 100% of all IO latencies seen during a test run. The plot method used below acts to 'zoom in' on the tail latency of these devices. While a given SSD might have very good average latency and IOPS, it's the outliers that lead to timeouts in time-critical applications, making tail latency an important item to detail.
I've taken some liberties in my approximations below the 99.999% point in these plots. Note that the spec sheet does claim typical latencies "<10us", which falls off to the left of the scale. Not only are the potential latencies great with Optane, the claimed consistency gains are even better. Translating what you see above, the highest percentile latency IOs of the P4800X should be 10x-100x (log scale above) faster than Intel's own SSD DC P3520. The P4800X should also easily beat the Micron 9100 MAX, even despite its IOPS being 5x higher than the P3520 at QD16. These lower latencies also mean we will have to add another decade to the low end of our Latency Percentile plots when we test these new products.
Well, there you have it. The cost/GB will naturally be higher for these new XPoint parts, but the expected performance improvements should make it well worth the additional cost for those who need blistering fast yet persistent storage.
Subject: Memory | February 3, 2017 - 08:42 PM | Tim Verry
Tagged: XPoint, server, Optane, Intel Optane, Intel, big data
Last week Hexus reported that Intel has begun shipping Optane memory modules to its partners for testing. This year should see the launch of both these enterprise products designed for servers as well as tiny application accelerator M.2 solid state drives based on the Intel and Micron joint 3D memory venture. The modules that Intel is shipping are the former type of Optane memory and will be able to replace DDR4 DIMMs (RAM) with a memory solution that is not as fast but is cheaper and has much larger storage capacities. The Optane modules are designed to slot into DDR4 type memory slots on server boards. The benefit for such a product lies in big data and scientific workloads where massive datasets will be able to be held in primary memory and the processor(s) will be able to access the data sets at much lower latencies than if it had to reach out to mass storage on spinning rust or even SAS or PCI-E solid state drives. Being able to hold all the data being worked on in one pool of memory will be cheaper with Optane as well as it is allegedly priced closer to NAND than RAM and the cost of RAM adds up extremely quickly when you need many terabytes of it (or more!). Various technologies attempting to bring higher capacity non volatile and/or flash-based storage in memory module form have been theorized or in the works in various forms for years now, but it appears that Intel will be the first ones to roll out actual products.
It will likely be years before the technology trickles down to consumer desktops and notebooks, so slapping what would effectively be a cheap RAM disk into your PC is still a ways out. Consumers will get a small taste of the Optane memory in the form of tiny storage drives that were rumored for a first quarter 2017 release following its Kaby Lake Z270 motherboards. Previous leaks suggest that the Intel Optane Memory 8000P would come in 16 GB and 32 GB capacities in a M.2 form factor. With a single 128-bit (16 GB) die Intel is able to hit speeds that current NAND flash based SSDs can only hit with multiple dies. Specifically the 16GB Optane application accelerator drive is allegedly capable of 285,000 random 4K IOPS, 70,000 random write 4K IOPS, Sequential 128K reads of 1400 MB/s, and sequential 128K writes of 300 MB/s. The 32GB Optane drive is a bit faster at 300,000 4K IOPS, 120,000 4K IOPS, 1600 MB/s, and 500 MB/s respectively.
Unfortunately, I do not have any numbers on how fast the Optane memory that will slot into the DDR4 slots will be, but seeing as two dies already max out the x2 PCI-E link they use in the M.2 Optane SSD, a dual sided memory module packed with rows of Optane dies on the significantly wider memory bus is very promising. It should lie somewhere closer to (but slower than) DDR4 but much faster than NAND flash while still being non volatile (it doesn't need constant power to retain the data).
I am interested to see what the final numbers are for Intel's Optane RAM and Optane storage drives. The company has certainly dialed down the hype for the technology as it approached fruition though that may be more to do with what they are able to do right now versus what the 3D XPoint memory technology itself is potentially capable of enabling. I look forward to what it will enable in the HPC market and eventually what will be possible for the desktop and gaming markets.
What are your thoughts on Intel and Micron's 3D XPoint memory and Intel's Optane implementation (Micron's implementation is QuantX)?
- IDF 2016: Intel To Demo Optane XPoint, Announces Optane Testbed for Enterprise Customers
- Intel Optane (XPoint) First Gen Product Specifications Leaked
- Intel Z270 Express and H270 Express Chipsets Support Kaby Lake, More PCI-E 3.0 Lanes
Subject: Storage | October 14, 2016 - 08:05 PM | Allyn Malventano
Tagged: XPoint, Optane, 8000p, Intel
Intel and Micron jointly launched XPoint technology over a year ago, and we've been waiting to see any additional info ever since. We saw Micron demo a prototype at FMS 2016, and we also saw the actual prototype. Intel's last demo was not so great, later demos were better), and we saw a roadmap leaked a few months ago. Thanks to another leak, we now have specs for one of Intel's first Optane products:
Now I know there is a bunch of rambling around the net already. "Why so small?!?!". What I think we are looking at is Stony Beach - Intel's 'Application Accelerator" seen here:
What further backs this theory is that you'll note the PCIe 3.0 x2 link of that product in the above roadmap, which couples nicely with the upper end limits seen in the 32GB product, which is clearly hitting a bandwidth limit at 1.6 GB/s, which is the typical max seen on a x2 PCIe 3.0 link.
Now with the capacity thing aside, there is another important thing to bring up. First gen XPoint dies are 128 Gbit, which works out to 16 GB. That means the product specs for the 16GB part are turning in those specs *WITH ONE DIE*. NAND based SSDs can only reach these sorts of figures by spreading the IO's across four, eight, or more dies operating in parallel. This is just one die, and it is nearly saturating two lanes of PCIe 3.0!
Another cool thing to note is that we don't typically get to know how well a single die of anything will perform. We always have to extrapolate backwards from the smaller capacities of SSDs, where the dies are the bottleneck instead of the interface to the host. Here we have the specs of one die of a product. Imagine what could be done with even wider interfaces and more dies!
XPoint fills the still relatively large performance gap between RAM and NAND, and does so while being non-volatile. There are good things on the horizon to be enabled by this technology, even if we first see it in smaller capacity products.
Subject: Storage | August 16, 2016 - 02:00 PM | Allyn Malventano
Tagged: XPoint, Testbed, Optane, Intel, IDF 2016, idf
IDF 2016 is up and running, and Intel will no doubt be announcing and presenting on a few items of interest. Of note for this Storage Editor are multiple announcements pertaining to upcoming Intel Optane technology products.
Optane is Intel’s branding of their joint XPoint venture with Micron. Intel launched this branding at last year's IDF, and while the base technology is as high as 1000x faster than NAND flash memory, full solutions wrapped around an NVMe capable controller have shown to sit at roughly a 10x improvement over NAND. That’s still nothing to sneeze at, and XPoint settles nicely into the performance gap seen between NAND and DRAM.
Since modern M.2 NVMe SSDs are encroaching on the point of diminishing returns for consumer products, Intel’s initial Optane push will be into the enterprise sector. There are plenty of use cases for a persistent storage tier faster than NAND, but most enterprise software is not currently equipped to take full advantage of the gains seen from such a disruptive technology.
XPoint die. 128Gbit of storage at a ~20nm process.
In an effort to accelerate the development and adoption of 3D XPoint optimized software, Intel will be offering enterprise customers access to an Optane Testbed. This will allow for performance testing and tuning of customers’ software and applications ahead of the shipment of Optane hardware.
I did note something interesting in Micron's FMS 2016 presentation. QD=1 random performance appears to start at ~320,000 IOPS, while the Intel demo from a year ago (first photo in this post) showed a prototype running at only 76,600 IOPS. Using that QD=1 example, it appears that as controller technology improves to handle the large performance gains of raw XPoint, so does performance. Given a NAND-based SSD only turns in 10-20k IOPS at that same queue depth, we're seeing something more along the lines of 16-32x performance gains with the Micron prototype. Those with a realistic understanding of how queues work will realize that the type of gains seen at such low queue depths will have a significant impact in real-world performance of these products.
The speed of 3D XPoint immediately shifts the bottleneck back to the controller, PCIe bus, and OS/software. True 1000x performance gains will not be realized until second generation XPoint DIMMs are directly linked to the CPU.
The raw die 1000x performance gains simply can't be fully realized when there is a storage stack in place (even an NVMe one). That's not to say XPoint will be slow, and based on what I've seen so far, I suspect XPoint haters will still end up burying their heads in the sand once we get a look at the performance results of production parts.
Leaked roadmap including upcoming Optane products
Intel is expected to show a demo of their own more recent Optane prototype, and we suspect similar performance gains there as their controller tech has likely matured. We'll keep an eye out and fill you in once we've seen Intel's newer Optane goodness it in action!
Subject: Storage | August 11, 2016 - 12:06 PM | Allyn Malventano
Tagged: FMS, FMS 2016, XPoint, micron, QuantX, nand, ram
Earlier this week, Micron launched their QuantX branding for XPoint devices, as well as giving us some good detail on expected IOPS performance of solutions containing these new parts:
Thanks to the very low latency of XPoint, the QuantX solution sees very high IOPS performance at a very low queue depth, and the random performance very quickly scales to fully saturate PCIe 3.0 x4 with only four queued commands. Micron's own 9100 MAX SSD (reviewed here), requires QD=256 (64x increase) just to come close to this level of performance! At that same presentation, a PCIe 3.0 x8 QuantX device was able to double that throughput at QD=8, but what are these things going to look like?
The real answer is just like modern day SSDs, but for the time being, we have the prototype unit pictured above. This is essentially an FPGA development board that Micron is using to prototype potential controller designs. Dedicated ASICs based on the final designs may be faster, but those take a while to ramp up volume production.
So there it is, in the flesh, nicely packaged and installed on a complete SSD. Sure it's a prototype, but Intel has promised we will see XPoint before the end of the year, and I'm excited to see this NAND-to-DRAM performance-gap-filling tech come to the masses!
Subject: Storage | August 9, 2016 - 05:59 PM | Allyn Malventano
Tagged: XPoint, Worm, storage, ssd, RocksDB, Optane, nand, flash, facebook
At their FMS 2016 Keynote, Facebook gave us some details on the various storage technologies that fuel their massive operation:
In the four corners above, they covered the full spectrum of storing bits. From NVMe to Lightning (huge racks of flash (JBOF)), to AVA (quad M.2 22110 NVMe SSDs), to the new kid on the block, WORM storage. WORM stands for Write Once Read Many, and as you might imagine, Facebook has lots of archival data that they would like to be able to read quickly, so this sort of storage fits the bill nicely. How do you pull off massive capacity in flash devices? QLC. Forget MLC or TLC, QLC stores four bits per cell, meaning there are 16 individual voltage states for each cell. This requires extremely precise writing techniques and reads must appropriately compensate for cell drift over time, and while this was a near impossibility with planar NAND, 3D NAND has more volume to store those electrons. This means one can trade the endurance gains of 3D NAND for higher bit density, ultimately enabling SSDs upwards of ~100TB in capacity. The catch is that they are rated at only ~150 write cycles. This is fine for archival storage requiring WORM workloads, and you still maintain NAND speeds when it comes to reading that data later on, meaning that decade old Facebook post will appear in your browser just as quickly as the one you posted ten minutes ago.
Next up was a look at some preliminary Intel Optane SSD results using RocksDB. Compared to a P3600, the prototype Optane part offers impressive gains in Facebook's real-world workload. Throughput jumped by 3x, and latency reduced to 1/10th of its previous value. These are impressive gains given this fairly heavy mixed workload.
More to follow from FMS 2016!
Subject: Storage | August 9, 2016 - 03:33 PM | Allyn Malventano
Tagged: XPoint, QuantX, nand, micron
Micron just completed their keynote address at Flash Memory Summit, and as part of the presentation, we saw our first look at some raw scaled Queue Depth IOPS performance figures from devices utilizing XPoint memory:
These are the performance figures from an U.2 device with a PCIe 3.0 x4 link. Note the outstanding ramp up to full saturation of the bus at a QD of only 4. Slower flash devices require much more parallelism and a deeper queue to achieve sufficient IOPS throughput to saturate that same bus. That 'slow' device on the bottom there, I'm pretty certain, is Micron's own 9100 MAX, which was the fastest thing we had tested to date, and it's being just walked all over by this new XPoint prototype!
Ok, so that's damn fast, but what if you had an add in card with PCIe 3.0 x8?
Ok, now that's just insane! While the queue had to climb to ~8 to reach these figures, that's 1.8 MILLION IOPS from a single HHHL add in card. That's greater than 7 GB/s worth of 4KB random performance!
In addition to the crazy throughput and IOPS figures, we also see latencies running at 1/10th that of flash-based NVMe devices.
..so it appears that while the cell-level performance of XPoint boasts 1000x improvements over flash, once you implement it into an actual solution that must operate within the bounds of current systems (NVMe and PCIe 3.0), we currently get only a 10x improvement over NAND flash. Given how fast NAND already is, 10x is no small improvement, and XPoint still opens the door for further improvement as the technology and implementations mature over time.
More to follow as FMS continues!