Subject: General Tech | August 17, 2017 - 11:21 AM | Alex Lustenberg
Tagged: video, T5, Samsung, RX VEGA 64, qualcomm, podcast, PC-Q39, P4800X, NX500, NGSFF, micron, Lian Li, Intel, EK Supremacy EVO, EDSFF, corsair, amd
PC Perspective Podcast #463 - 08/17/17
Join us for AMD Threadripper, Intel Rumors, and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the iTunes Store (audio only)
- Google Play - Subscribe to our audio podcast directly through Google Play!
- RSS - Subscribe through your regular RSS reader (audio only)
- MP3 - Direct download link to the MP3 file
Hosts: Allyn Malventano, Josh Walrath, Ken Addison, Sebastian Peak
Peanut Gallery: Alex Lustenberg
Program length: 1:37:18
Subject: Storage | August 14, 2017 - 08:09 AM | Allyn Malventano
Tagged: P4800X, XPoint, NVMe, HHHL, Optane, Intel, ssd, DC
We reviewed the Intel P4800X - Intel's first 3D XPoint SSD, back in April of this year. The one thing missing from that review was product pictures. Sure we had stock photos, but we did not have the product in hand due to the extremely limited number of samples and the need for Intel to be able to make more real-time updates to the hardware based on our feedback during the testing process (reviewers making hardware better FTW!). After the reviews were done, sample priority shifted to the software vendors who needed time to further develop their code bases to take better advantage of the very low latency that Optane can offer. One of those companies is VMware, and one of our friends from over there was able to get some tinker time with one of their samples.
Paul whipped up a few videos showing the installation process as well as timing a server boot directly from the P4800X (something we could not do in our review since we were testing on a remote server). I highly encourage those interested in the P4800X (and the upcoming consumer versions of the same) to check out the article on TinkerTry. I also recommend those wanting to know what Optane / XPoint is and how it works to check out our article here.
Introduction and Specifications
XPoint. Optane. QuantX. We've been hearing these terms thrown around for two years now. A form of 3D stackable non-volatile memory that promised 10x the density of DRAM and 1000x the speed and endurance of NAND. These were bold statements, and over the following months, we would see them misunderstood and misconstrued by many in the industry. These misconceptions were further amplified by some poor demo choices on the part of Intel (fortunately countered by some better choices made by Micron). Fortunately cooler heads prevailed as Jim Handy and other industry analysts helped explain that a 1000x improvement at the die level does not translate to the same improvement at the device level, especially when the first round of devices must comply with what will soon become a legacy method of connecting a persistent storage device to a PC.
Did I just suggest that PCIe 3.0 and the NVMe protocol - developed just for high-speed storage, is already legacy tech? Well, sorta.
That 'Future NVM' bar at the bottom of that chart there was a 2-year old prototype iteration of what is now Optane. Note that while NVMe was able to shrink down the yellow bar a bit, as you introduce faster and faster storage, the rest of the equation (meaning software, including the OS kernel) starts to have a larger and larger impact on limiting the ultimate speed of the device.
NAND Flash simplified schematic (via Wikipedia)
Before getting into the first retail product to push all of these links in the storage chain to the limit, let's explain how XPoint works and what makes it faster. Taking random writes as an example, NAND Flash (above) must program cells in pages and erase cells in blocks. As modern flash has increased in capacity, the sizes of those pages and blocks have scaled up roughly proportionally. At present day we are at pages >4KB and block sizes in the megabytes. When it comes to randomly writing to an already full section of flash, simply changing the contents of one byte on one page requires the clearing and rewriting of the entire block. The difference between what you wanted to write and what the flash had to rewrite to accomplish that operation is called the write amplification factor. It's something that must be dealt with when it comes to flash memory management, but for XPoint it is a completely different story:
XPoint is bit addressible. The 'cross' structure means you can select very small groups of data via Wordlines, with the ultimate selection resolving down to a single bit.
Since the programmed element effectively acts as a resistor, its output is read directly and quickly. Even better - none of that write amplification nonsense mentioned above applies here at all. There are no pages or blocks. If you want to write a byte, go ahead. Even better is that the bits can be changed regardless of their former state, meaning no erase or clear cycle must take place before writing - you just overwrite directly over what was previously stored. Is that 1000x faster / 1000x more write endurance than NAND thing starting to make more sense now?
Ok, with all of the background out of the way, let's get into the meat of the story. I present the P4800X:
Subject: Storage | February 10, 2017 - 04:22 PM | Allyn Malventano
Tagged: Optane, XPoint, P4800X, 375GB
Over the past few hours, we have seen another Intel Optane SSD leak rise to the surface. While we previously saw a roadmap and specs for a mobile storage accelerator platform, this time we have some specs for an enterprise part:
The specs are certainly impressive. While they don't match the maximum theoretical figures we heard at the initial XPoint announcement, we do see an endurance rating of 30 DWPD (drive writes per day), which is impressive given competing NAND products typically run in the single digits for that same metric. The 12.3 PetaBytes Written (PBW) rating is even more impressive given the capacity point that rating is based on is only 375GB (compare with 2000+ GB of enterprise parts that still do not match that figure).
Now I could rattle off the rest of the performance figures, but those are just numbers, and fortunately we have ways of showing these specs in a more practical manner:
Assuming the P4800X at least meets its stated specifications (very likely given Intel's track record there), and also with the understanding that XPoint products typically reach their maximum IOPS at Queue Depths far below 16, we can compare the theoretical figures for this new Optane part to the measured results from the two most recent NAND-based enterprise launches. To say the random performance makes leaves those parts in the dust is an understatement. 500,000+ IOPS is one thing, but doing so at lower QD's (where actual real-world enterprise usage actually sits) just makes this more of an embarrassment to NAND parts. The added latency of NAND translates to far higher/impractical QD's (256+) to reach their maximum ratings.
Intel research on typical Queue Depths seen in various enterprise workloads. Note that a lower latency device running the same workload will further 'shallow the queue', meaning even lower QD.
Another big deal in the enterprise is QoS. High IOPS and low latency are great, but where the rubber meets the road here is consistency. Enterprise tests measure this in varying degrees of "9's", which exponentially approach 100% of all IO latencies seen during a test run. The plot method used below acts to 'zoom in' on the tail latency of these devices. While a given SSD might have very good average latency and IOPS, it's the outliers that lead to timeouts in time-critical applications, making tail latency an important item to detail.
I've taken some liberties in my approximations below the 99.999% point in these plots. Note that the spec sheet does claim typical latencies "<10us", which falls off to the left of the scale. Not only are the potential latencies great with Optane, the claimed consistency gains are even better. Translating what you see above, the highest percentile latency IOs of the P4800X should be 10x-100x (log scale above) faster than Intel's own SSD DC P3520. The P4800X should also easily beat the Micron 9100 MAX, even despite its IOPS being 5x higher than the P3520 at QD16. These lower latencies also mean we will have to add another decade to the low end of our Latency Percentile plots when we test these new products.
Well, there you have it. The cost/GB will naturally be higher for these new XPoint parts, but the expected performance improvements should make it well worth the additional cost for those who need blistering fast yet persistent storage.