Subject: General Tech, Memory, Storage | May 26, 2017 - 10:14 PM | Tim Verry
Tagged: XPoint, Intel, HPC, DIMM, 3D XPoint
Intel recently teased a bit of new information on its 3D XPoint DIMMs and launched its first public demonstration of the technology at the SAP Sapphire conference where SAP’s HANA in-memory data analytics software was shown working with the new “Intel persistent memory.” Slated to arrive in 2018, the new Intel DIMMs based on the 3D XPoint technology developed by Intel and Micron will work in systems alongside traditional DRAM to provide a pool of fast, low latency, and high density nonvolatile storage that is a middle ground between expensive DDR4 and cheaper NVMe SSDs and hard drives. When looking at the storage stack, the storage density increases along with latency as it gets further away from the CPU. The opposite is also true, as storage and memory gets closer to the processor, bandwidth increases, latency decreases, and costs increase per unit of storage. Intel is hoping to bridge the gap between system DRAM and PCI-E and SATA storage.
According to Intel, system RAM offers up 10 GB/s per channel and approximately 100 nanoseconds of latency. 3D XPoint DIMMs will offer 6 GB/s per channel and about 250 nanoseconds of latency. Below that is the 3D XPoint-based NVMe SSDs (e.g. Optane) on a PCI-E x4 bus where they max out the bandwidth of the bus at ~3.2 GB/s and 10 microseconds of latency. Intel claims that non XPoint NVMe NAND solid state drives have around 100 microsecomds of latency, and of course, it gets worse from there when you go to NAND-based SSDs or even hard drives hanging of the SATA bus.
Intel’s new XPoint DIMMs have persistent storage and will offer more capacity that will be possible and/or cost effective with DDR4 DRAM. In giving up some bandwidth and latency, enterprise users will be able to have a large pool of very fast storage for storing their databases and other latency and bandwidth sensitive workloads. Intel does note that there are security concerns with the XPoint DIMMs being nonvolatile in that an attacker with physical access could easily pull the DIMM and walk away with the data (it is at least theoretically possible to grab some data from RAM as well, but it will be much easier to grab the data from the XPoint sticks. Encryption and other security measures will need to be implemented to secure the data, both in use and at rest.
Interestingly, Intel is not positioning the XPoint DIMMs as a replacement for RAM, but instead as a supplement. RAM and XPoint DIMMs will be installed in different slots of the same system and the DDR4 RAM will be used for the OS and system critical applications while the XPoint pool of storage will be used for storing data that applications will work on much like a traditional RAM disk but without needing to load and save the data to a different medium for persistent storage and offering a lot more GBs for the money.
While XPoint is set to arrive next year along with Cascade Lake Xeons, it will likely be a couple of years before the technology takes off. Supporting it is going to require hardware and software support for the workstations and servers as well as developers willing to take advantage of it when writing their specialized applications. Fortunately, Intel started shipping the memory modules to its partners for testing earlier this year. It is an interesting technology and the DIMM solution and direct CPU interface will really let the 3D XPoint memory shine and reach its full potential. It will primarily be useful for the enterprise, scientific, and financial industries where there is a huge need for faster and lower latency storage that can accommodate massive (multiple terabyte+) data sets that continue to get larger and more complex. It is a technology that likely will not trickle down to consumers for a long time, but I will be ready when it does. In the meantime, I am eager to see what kinds of things it will enable the big data companies and researchers to do! Intel claims it will not only be useful at supporting massive in-memory databases and accelerating HPC workloads but for things like virtualization, private clouds, and software defined storage.
What are your thoughts on this new memory tier and the future of XPoint?
Subject: Storage | April 24, 2017 - 05:20 PM | Jeremy Hellstrom
Tagged: XPoint, srt, rst, Optane Memory, Optane, Intel, hybrid, CrossPoint, cache, 32GB, 16GB
At $44 for 16GB or $77 for a 32GB module Intel's Optane memory will cost you less in total for an M.2 SSD, though a significantly higher price per gigabyte. The catch is that you need to have a Kaby Lake Core system to be able to utilize Optane, which means you are unlikely to be using a HDD. Al's test show that Optane will also benefit a system using an SSD, reducing latency noticeably although not as significantly as with a HDD.
The Tech Report tested it differently, by sourcing a brand new desktop system with Kaby Lake Core APU that did not ship with an SSD. Once installed, the Optane drive enabled the system to outpace an affordable 480GB SSD in some scenarios; very impressive for a HDD. They also did peek at the difference Optane makes when paired with aforementioned affordable SSD in their full review.
"Intel's Optane Memory tech purports to offer most of the responsiveness of an SSD to systems whose primary storage device is a good old hard drive. We put a 32GB stick of Optane Memory to the test to see whether it lives up to Intel's claims."
Here are some more Storage reviews from around the web:
- Intel Optane Memory Review - 1.4GB/s Speed & 300K IOPS for $44 @ The SSD Review
- The Intel Optane Memory Module Review @ Hardware Canucks
- Kingston DCP1000 NVMe SSD Reaches 7GB/s @ Kitguru
- WD Blue 1,000 GiB SSD @ Hardware Secrets
- Synology DiskStation DS916+ 4-Bay NAS @ Kitguru
- Drobo 5N2 NAS @ Kitguru
- Kingston Ultimate GT 2TB Flash Drive @ The SSD Review
- Toshiba X300 6TB HDD @ Kitguru
Introduction, Specifications, and Requirements
Finally! Optane Memory sitting in our lab! Sure, it’s not the mighty P4800X we remotely tested over the past month, but this is right here, sitting on my desk. It’s shipping, too, meaning it could be sitting on your desk (or more importantly, in your PC) in just a matter of days.
The big deal about Optane is that it uses XPoint Memory, which has fast-as-lightning (faster, actually) response times of less than 10 microseconds. Compare this to the fastest modern NAND flash at ~90 microseconds, and the differences are going to add up fast. What’s wonderful about these response times is that they still hold true even when scaling an Optane product all the way down to just one or two dies of storage capacity. When you consider that managing fewer dies means less work for the controller, we can see latencies fall even further in some cases (as we will see later).
Subject: Storage | March 27, 2017 - 12:16 PM | Allyn Malventano
Tagged: XPoint, Optane Memory, Optane, M.2, Intel, cache, 3D XPoint
We are just about to hit two years since Intel and Micron jointly launched 3D XPoint, and there have certainly been a lot of stories about it since. Intel officially launched the P4800X last week, and this week they are officially launching Optane Memory. The base level information about Optane Memory is mostly unchanged, however, we do have a slide deck we are allowed to pick from to point out some of the things we can look forward to once the new tech starts hitting devices you can own.
Alright, so this is Optane Memory in a nutshell. Put some XPoint memory on an M.2 form factor device, leverage Intel's SRT caching tech, and you get a 16GB or 32GB cache laid over your system's primary HDD.
To help explain what good Optane can do for typical desktop workloads, first we need to dig into Queue Depths a bit. Above are some examples of the typical QD various desktop applications run at. This data is from direct IO trace captures of systems in actual use. Now that we've established that the majority of desktop workloads operate at very low Queue Depths (<= 4), lets see where Optane performance falls relative to other storage technologies:
There's a bit to digest in this chart, but let me walk you through it. The ranges tapering off show the percentage of IOs falling at the various Queue Depths, while the green, red, and orange lines ramping up to higher IOPS (right axis) show relative SSD performance at those same Queue Depths. The key to Optane's performance benefit here is that it can ramp up to full performance at very low QD's, while the other NAND-based parts require significantly higher parallel requests to achieve full rated performance. This is what will ultimately lead to a much snappier responsiveness for, well, just about anything hitting the storage. Fun fact - there is actually a HDD on that chart. It's the yellow line that you might have mistook as the horizontal axis :).
As you can see, we have a few integrators on board already. Official support requires a 270 series motherboard and Kaby Lake CPU, but it is possible that motherboard makers could backport the required NVMe v1.1 and Intel RST 15.5 requirements into older systems.
For those curious, if caching is the only way power users will be able to go with Optane, that's not the case. Atop that pyramid there sits an 'Intel Optane SSD', which should basically be a consumer version of the P4800X. It is sure to be an incredibly fast SSD, but that performance will most definitely come at a price!
We should be testing Optane Memory shortly and will finally have some publishable results of this new tech as soon as we can!
Subject: Storage | March 19, 2017 - 12:21 PM | Allyn Malventano
Tagged: XPoint, SSD DC P4800X, Optane Memory, Optane, Intel, client, 750GB, 3D XPoint, 375GB, 1.5TB
Intel brought us out to their Folsom campus last week for some in-depth product briefings. Much of our briefing is still under embargo, but the portion that officially lifts this morning is the SSD DC P4800X:
MSRP for the 375GB model is estimated at $1520 ($4/GB), which is rather spendy, but given that the product has shown it can effectively displace RAM in servers, we should be comparing the cost/GB with DRAM and not NAND. It should also be noted this is also nearly half the cost/GB of the X25-M at its launch. Capacities will go all the way up to 1.5TB, and U.2 form factor versions are also on the way.
For those wanting a bit more technical info, the P4800X uses a 7-channel controller, with the 375GB model having 4 dies per channel (28 total). Overprovisioning does not do for Optane what it did for NAND flash, as XPoint can be rewritten at the byte level and does not need to be programmed in (KB) pages and erased in larger (MB) blocks. The only extra space on Optane SSDs is for ECC, firmware, and a small spare area to map out any failed cells.
Those with a keen eye (and calculator) might have noted that the early TBW values only put the P4800X at 30 DWPD for a 3-year period. At the event, Intel confirmed that they anticipate the P4800X to qualify at that same 30 DWPD for a 5-year period by the time volume shipment occurs.
Subject: Storage | February 15, 2017 - 08:58 PM | Allyn Malventano
Tagged: XPoint, ssd, Optane, memory, Intel, cache
We now have an actual Optane landing page on the Intel site that discusses the first iteration of 'Intel Optane Memory', which appears to be the 8000p Series that we covered last October and saw as an option on some upcoming Lenovo laptops. The site does not cover the upcoming enterprise parts like the 375GB P4800X, but instead, focuses on the far smaller 16GB and 32GB 'System Accelerator' M.2 modules.
Despite using only two lanes of PCIe 3.0, these modules turn in some impressive performance, but the capacities when using only one or two (16GB each) XPoint dies preclude an OS install. Instead, these will be used, presumably in combination with a newer form of Intel's Rapid Storage Technology driver, as a caching layer meant as an HDD accelerator:
While the random write performance and endurance of these parts blow any NAND-based SSD out of the water, the 2-lane bottleneck holds them back compared to high-end NVMe NAND SSDs, so we will likely see this first consumer iteration of Intel Optane Memory in OEM systems equipped with hard disks as their primary storage. A very quick 32GB caching layer should help speed things up considerably for the majority of typical buyers of these types of mobile and desktop systems, while still keeping the total cost below that for a decent capacity NAND SSD as primary storage. Hey, if you can't get every vendor to switch to pure SSD, at least you can speed up that spinning rust a bit, right?
Subject: Storage | February 10, 2017 - 04:22 PM | Allyn Malventano
Tagged: Optane, XPoint, P4800X, 375GB
Over the past few hours, we have seen another Intel Optane SSD leak rise to the surface. While we previously saw a roadmap and specs for a mobile storage accelerator platform, this time we have some specs for an enterprise part:
The specs are certainly impressive. While they don't match the maximum theoretical figures we heard at the initial XPoint announcement, we do see an endurance rating of 30 DWPD (drive writes per day), which is impressive given competing NAND products typically run in the single digits for that same metric. The 12.3 PetaBytes Written (PBW) rating is even more impressive given the capacity point that rating is based on is only 375GB (compare with 2000+ GB of enterprise parts that still do not match that figure).
Now I could rattle off the rest of the performance figures, but those are just numbers, and fortunately we have ways of showing these specs in a more practical manner:
Assuming the P4800X at least meets its stated specifications (very likely given Intel's track record there), and also with the understanding that XPoint products typically reach their maximum IOPS at Queue Depths far below 16, we can compare the theoretical figures for this new Optane part to the measured results from the two most recent NAND-based enterprise launches. To say the random performance makes leaves those parts in the dust is an understatement. 500,000+ IOPS is one thing, but doing so at lower QD's (where actual real-world enterprise usage actually sits) just makes this more of an embarrassment to NAND parts. The added latency of NAND translates to far higher/impractical QD's (256+) to reach their maximum ratings.
Intel research on typical Queue Depths seen in various enterprise workloads. Note that a lower latency device running the same workload will further 'shallow the queue', meaning even lower QD.
Another big deal in the enterprise is QoS. High IOPS and low latency are great, but where the rubber meets the road here is consistency. Enterprise tests measure this in varying degrees of "9's", which exponentially approach 100% of all IO latencies seen during a test run. The plot method used below acts to 'zoom in' on the tail latency of these devices. While a given SSD might have very good average latency and IOPS, it's the outliers that lead to timeouts in time-critical applications, making tail latency an important item to detail.
I've taken some liberties in my approximations below the 99.999% point in these plots. Note that the spec sheet does claim typical latencies "<10us", which falls off to the left of the scale. Not only are the potential latencies great with Optane, the claimed consistency gains are even better. Translating what you see above, the highest percentile latency IOs of the P4800X should be 10x-100x (log scale above) faster than Intel's own SSD DC P3520. The P4800X should also easily beat the Micron 9100 MAX, even despite its IOPS being 5x higher than the P3520 at QD16. These lower latencies also mean we will have to add another decade to the low end of our Latency Percentile plots when we test these new products.
Well, there you have it. The cost/GB will naturally be higher for these new XPoint parts, but the expected performance improvements should make it well worth the additional cost for those who need blistering fast yet persistent storage.
Subject: Memory | February 3, 2017 - 08:42 PM | Tim Verry
Tagged: XPoint, server, Optane, Intel Optane, Intel, big data
Last week Hexus reported that Intel has begun shipping Optane memory modules to its partners for testing. This year should see the launch of both these enterprise products designed for servers as well as tiny application accelerator M.2 solid state drives based on the Intel and Micron joint 3D memory venture. The modules that Intel is shipping are the former type of Optane memory and will be able to replace DDR4 DIMMs (RAM) with a memory solution that is not as fast but is cheaper and has much larger storage capacities. The Optane modules are designed to slot into DDR4 type memory slots on server boards. The benefit for such a product lies in big data and scientific workloads where massive datasets will be able to be held in primary memory and the processor(s) will be able to access the data sets at much lower latencies than if it had to reach out to mass storage on spinning rust or even SAS or PCI-E solid state drives. Being able to hold all the data being worked on in one pool of memory will be cheaper with Optane as well as it is allegedly priced closer to NAND than RAM and the cost of RAM adds up extremely quickly when you need many terabytes of it (or more!). Various technologies attempting to bring higher capacity non volatile and/or flash-based storage in memory module form have been theorized or in the works in various forms for years now, but it appears that Intel will be the first ones to roll out actual products.
It will likely be years before the technology trickles down to consumer desktops and notebooks, so slapping what would effectively be a cheap RAM disk into your PC is still a ways out. Consumers will get a small taste of the Optane memory in the form of tiny storage drives that were rumored for a first quarter 2017 release following its Kaby Lake Z270 motherboards. Previous leaks suggest that the Intel Optane Memory 8000P would come in 16 GB and 32 GB capacities in a M.2 form factor. With a single 128-bit (16 GB) die Intel is able to hit speeds that current NAND flash based SSDs can only hit with multiple dies. Specifically the 16GB Optane application accelerator drive is allegedly capable of 285,000 random 4K IOPS, 70,000 random write 4K IOPS, Sequential 128K reads of 1400 MB/s, and sequential 128K writes of 300 MB/s. The 32GB Optane drive is a bit faster at 300,000 4K IOPS, 120,000 4K IOPS, 1600 MB/s, and 500 MB/s respectively.
Unfortunately, I do not have any numbers on how fast the Optane memory that will slot into the DDR4 slots will be, but seeing as two dies already max out the x2 PCI-E link they use in the M.2 Optane SSD, a dual sided memory module packed with rows of Optane dies on the significantly wider memory bus is very promising. It should lie somewhere closer to (but slower than) DDR4 but much faster than NAND flash while still being non volatile (it doesn't need constant power to retain the data).
I am interested to see what the final numbers are for Intel's Optane RAM and Optane storage drives. The company has certainly dialed down the hype for the technology as it approached fruition though that may be more to do with what they are able to do right now versus what the 3D XPoint memory technology itself is potentially capable of enabling. I look forward to what it will enable in the HPC market and eventually what will be possible for the desktop and gaming markets.
What are your thoughts on Intel and Micron's 3D XPoint memory and Intel's Optane implementation (Micron's implementation is QuantX)?
- IDF 2016: Intel To Demo Optane XPoint, Announces Optane Testbed for Enterprise Customers
- Intel Optane (XPoint) First Gen Product Specifications Leaked
- Intel Z270 Express and H270 Express Chipsets Support Kaby Lake, More PCI-E 3.0 Lanes
Subject: Storage | October 14, 2016 - 08:05 PM | Allyn Malventano
Tagged: XPoint, Optane, 8000p, Intel
Intel and Micron jointly launched XPoint technology over a year ago, and we've been waiting to see any additional info ever since. We saw Micron demo a prototype at FMS 2016, and we also saw the actual prototype. Intel's last demo was not so great, later demos were better), and we saw a roadmap leaked a few months ago. Thanks to another leak, we now have specs for one of Intel's first Optane products:
Now I know there is a bunch of rambling around the net already. "Why so small?!?!". What I think we are looking at is Stony Beach - Intel's 'Application Accelerator" seen here:
What further backs this theory is that you'll note the PCIe 3.0 x2 link of that product in the above roadmap, which couples nicely with the upper end limits seen in the 32GB product, which is clearly hitting a bandwidth limit at 1.6 GB/s, which is the typical max seen on a x2 PCIe 3.0 link.
Now with the capacity thing aside, there is another important thing to bring up. First gen XPoint dies are 128 Gbit, which works out to 16 GB. That means the product specs for the 16GB part are turning in those specs *WITH ONE DIE*. NAND based SSDs can only reach these sorts of figures by spreading the IO's across four, eight, or more dies operating in parallel. This is just one die, and it is nearly saturating two lanes of PCIe 3.0!
Another cool thing to note is that we don't typically get to know how well a single die of anything will perform. We always have to extrapolate backwards from the smaller capacities of SSDs, where the dies are the bottleneck instead of the interface to the host. Here we have the specs of one die of a product. Imagine what could be done with even wider interfaces and more dies!
XPoint fills the still relatively large performance gap between RAM and NAND, and does so while being non-volatile. There are good things on the horizon to be enabled by this technology, even if we first see it in smaller capacity products.
Subject: Storage | August 16, 2016 - 02:00 PM | Allyn Malventano
Tagged: XPoint, Testbed, Optane, Intel, IDF 2016, idf
IDF 2016 is up and running, and Intel will no doubt be announcing and presenting on a few items of interest. Of note for this Storage Editor are multiple announcements pertaining to upcoming Intel Optane technology products.
Optane is Intel’s branding of their joint XPoint venture with Micron. Intel launched this branding at last year's IDF, and while the base technology is as high as 1000x faster than NAND flash memory, full solutions wrapped around an NVMe capable controller have shown to sit at roughly a 10x improvement over NAND. That’s still nothing to sneeze at, and XPoint settles nicely into the performance gap seen between NAND and DRAM.
Since modern M.2 NVMe SSDs are encroaching on the point of diminishing returns for consumer products, Intel’s initial Optane push will be into the enterprise sector. There are plenty of use cases for a persistent storage tier faster than NAND, but most enterprise software is not currently equipped to take full advantage of the gains seen from such a disruptive technology.
XPoint die. 128Gbit of storage at a ~20nm process.
In an effort to accelerate the development and adoption of 3D XPoint optimized software, Intel will be offering enterprise customers access to an Optane Testbed. This will allow for performance testing and tuning of customers’ software and applications ahead of the shipment of Optane hardware.
I did note something interesting in Micron's FMS 2016 presentation. QD=1 random performance appears to start at ~320,000 IOPS, while the Intel demo from a year ago (first photo in this post) showed a prototype running at only 76,600 IOPS. Using that QD=1 example, it appears that as controller technology improves to handle the large performance gains of raw XPoint, so does performance. Given a NAND-based SSD only turns in 10-20k IOPS at that same queue depth, we're seeing something more along the lines of 16-32x performance gains with the Micron prototype. Those with a realistic understanding of how queues work will realize that the type of gains seen at such low queue depths will have a significant impact in real-world performance of these products.
The speed of 3D XPoint immediately shifts the bottleneck back to the controller, PCIe bus, and OS/software. True 1000x performance gains will not be realized until second generation XPoint DIMMs are directly linked to the CPU.
The raw die 1000x performance gains simply can't be fully realized when there is a storage stack in place (even an NVMe one). That's not to say XPoint will be slow, and based on what I've seen so far, I suspect XPoint haters will still end up burying their heads in the sand once we get a look at the performance results of production parts.
Leaked roadmap including upcoming Optane products
Intel is expected to show a demo of their own more recent Optane prototype, and we suspect similar performance gains there as their controller tech has likely matured. We'll keep an eye out and fill you in once we've seen Intel's newer Optane goodness it in action!