Subject: Storage | November 20, 2017 - 10:56 PM | Allyn Malventano
Tagged: Z-NAND, SZ985, slc, Samsung, P4800X, nand, Intel, flash
We haven't heard much about Samsung's 'XPoint Killer' Z-NAND since Flash Memory Summit 2017, but now we have a bit more to go on:
Yes, actual specs. In print. Not bad either, considering the Samsung SZ985 appears to offer a bus-saturating 3.2GB/s for reads and writes. The 30 DWPD figure matches Intel's P4800X, which is impressive given Samsung's part operates on flash derived from their V-NAND line (but operating in a different mode). The most important figures here are latency, so let's focus there for a bit:
While the SZ985 runs at ~1/3rd the latency of Samsung's own NAND SSDs, it has roughly double the latency of the P4800X. For the moment that is actually not as bad as it seems as it takes a fair amount of platform optimization to see the full performance benefits of optane, and operating slightly higher on the latency spectrum helps negate the negative impacts of incorrectly optimized platforms:
Source: Shrout Research
As you can see above, operating at slightly higher latencies, while netting lower overall performance, does lessen the sting of platform induced IRQ latency penalties.
Now to discuss costs. While we don't have any hard figures, we do have the above slide from FMS 2017, where Samsung stressed that they are trying to get the costs of Z-NAND down while keeping latencies as low as possible.
Image Source: ExtremeTech
Samsung backed up their performance claims with a Technology Brief (available here), which showed decent performance gains and cited use cases paralleling those we've seen used by Intel. The takeaway here is that Samsung *may* be able to compete with the Intel P4800X in a similar performance bracket - not matching the performance but perhaps beating it on cost. The big gotcha is that we have yet to see a single Samsung NVMe Enterprise SSD come through our labs for testing, or anywhere on the market for that matter, so take these sorts of announcements with a grain of salt until we see these products gain broader adoption/distribution.
Introduction and Specifications
Back in April, we finally got our mitts on some actual 3D XPoint to test, but there was a catch. We had to do so remotely. The initial round of XPoint testing done (by all review sites) was on a set of machines located on the Intel campus. Intel had their reasons for this unorthodox review method, but we were satisfied that everything was done above board. Intel even went as far as walking me over to the very server that we would be remoting into for testing. Despite this, there were still a few skeptics out there, and today we can put all of that to bed.
This is a 750GB Intel Optane SSD DC P4800X - in the flesh and this time on *our* turf. I'll be putting it through the same initial round of tests we conducted remotely back in April. I intend to follow up at a later date with additional testing depth, as well as evaluating kernel response times across Windows and Linux (IRQ, Polling, Hybrid Polling, etc), but for now, we're here to confirm the results on our own testbed as well as evaluate if the higher capacity point takes any sort of hit to performance. We may actually see a performance increase in some areas as Intel has had several months to further tune the P4800X.
This video is for the earlier 375GB model launch, but all points apply here
(except that the 900P has now already launched)
The baseline specs remain the same as they were back in April with a few significant notable exceptions:
The endurance figure for the 375GB capacity has nearly doubled to 20.5 PBW (PetaBytes Written), with the 750GB capacity logically following suit at 41 PBW. These figures are based on a 30 DWPD (Drive Write Per Day) rating spanned across a 5-year period. The original product brief is located here, but do note that it may be out of date.
We now have official sequential throughput ratings: 2.0 GB/s writes and 2.4 GB/s reads.
We also have been provided detailed QoS figures and those will be noted as we cover the results throughout the review.
Subject: General Tech | August 17, 2017 - 11:21 AM | Alex Lustenberg
Tagged: video, T5, Samsung, RX VEGA 64, qualcomm, podcast, PC-Q39, P4800X, NX500, NGSFF, micron, Lian Li, Intel, EK Supremacy EVO, EDSFF, corsair, amd
PC Perspective Podcast #463 - 08/17/17
Join us for AMD Threadripper, Intel Rumors, and more!
The URL for the podcast is: http://pcper.com/podcast - Share with your friends!
- iTunes - Subscribe to the podcast directly through the iTunes Store (audio only)
- Google Play - Subscribe to our audio podcast directly through Google Play!
- RSS - Subscribe through your regular RSS reader (audio only)
- MP3 - Direct download link to the MP3 file
Hosts: Allyn Malventano, Josh Walrath, Ken Addison, Sebastian Peak
Peanut Gallery: Alex Lustenberg
Program length: 1:37:18
Subject: Storage | August 14, 2017 - 08:09 AM | Allyn Malventano
Tagged: P4800X, XPoint, NVMe, HHHL, Optane, Intel, ssd, DC
We reviewed the Intel P4800X - Intel's first 3D XPoint SSD, back in April of this year. The one thing missing from that review was product pictures. Sure we had stock photos, but we did not have the product in hand due to the extremely limited number of samples and the need for Intel to be able to make more real-time updates to the hardware based on our feedback during the testing process (reviewers making hardware better FTW!). After the reviews were done, sample priority shifted to the software vendors who needed time to further develop their code bases to take better advantage of the very low latency that Optane can offer. One of those companies is VMware, and one of our friends from over there was able to get some tinker time with one of their samples.
Paul whipped up a few videos showing the installation process as well as timing a server boot directly from the P4800X (something we could not do in our review since we were testing on a remote server). I highly encourage those interested in the P4800X (and the upcoming consumer versions of the same) to check out the article on TinkerTry. I also recommend those wanting to know what Optane / XPoint is and how it works to check out our article here.
Introduction and Specifications
XPoint. Optane. QuantX. We've been hearing these terms thrown around for two years now. A form of 3D stackable non-volatile memory that promised 10x the density of DRAM and 1000x the speed and endurance of NAND. These were bold statements, and over the following months, we would see them misunderstood and misconstrued by many in the industry. These misconceptions were further amplified by some poor demo choices on the part of Intel (fortunately countered by some better choices made by Micron). Fortunately cooler heads prevailed as Jim Handy and other industry analysts helped explain that a 1000x improvement at the die level does not translate to the same improvement at the device level, especially when the first round of devices must comply with what will soon become a legacy method of connecting a persistent storage device to a PC.
Did I just suggest that PCIe 3.0 and the NVMe protocol - developed just for high-speed storage, is already legacy tech? Well, sorta.
That 'Future NVM' bar at the bottom of that chart there was a 2-year old prototype iteration of what is now Optane. Note that while NVMe was able to shrink down the yellow bar a bit, as you introduce faster and faster storage, the rest of the equation (meaning software, including the OS kernel) starts to have a larger and larger impact on limiting the ultimate speed of the device.
NAND Flash simplified schematic (via Wikipedia)
Before getting into the first retail product to push all of these links in the storage chain to the limit, let's explain how XPoint works and what makes it faster. Taking random writes as an example, NAND Flash (above) must program cells in pages and erase cells in blocks. As modern flash has increased in capacity, the sizes of those pages and blocks have scaled up roughly proportionally. At present day we are at pages >4KB and block sizes in the megabytes. When it comes to randomly writing to an already full section of flash, simply changing the contents of one byte on one page requires the clearing and rewriting of the entire block. The difference between what you wanted to write and what the flash had to rewrite to accomplish that operation is called the write amplification factor. It's something that must be dealt with when it comes to flash memory management, but for XPoint it is a completely different story:
XPoint is bit addressible. The 'cross' structure means you can select very small groups of data via Wordlines, with the ultimate selection resolving down to a single bit.
Since the programmed element effectively acts as a resistor, its output is read directly and quickly. Even better - none of that write amplification nonsense mentioned above applies here at all. There are no pages or blocks. If you want to write a byte, go ahead. Even better is that the bits can be changed regardless of their former state, meaning no erase or clear cycle must take place before writing - you just overwrite directly over what was previously stored. Is that 1000x faster / 1000x more write endurance than NAND thing starting to make more sense now?
Ok, with all of the background out of the way, let's get into the meat of the story. I present the P4800X:
Subject: Storage | February 10, 2017 - 04:22 PM | Allyn Malventano
Tagged: Optane, XPoint, P4800X, 375GB
Over the past few hours, we have seen another Intel Optane SSD leak rise to the surface. While we previously saw a roadmap and specs for a mobile storage accelerator platform, this time we have some specs for an enterprise part:
The specs are certainly impressive. While they don't match the maximum theoretical figures we heard at the initial XPoint announcement, we do see an endurance rating of 30 DWPD (drive writes per day), which is impressive given competing NAND products typically run in the single digits for that same metric. The 12.3 PetaBytes Written (PBW) rating is even more impressive given the capacity point that rating is based on is only 375GB (compare with 2000+ GB of enterprise parts that still do not match that figure).
Now I could rattle off the rest of the performance figures, but those are just numbers, and fortunately we have ways of showing these specs in a more practical manner:
Assuming the P4800X at least meets its stated specifications (very likely given Intel's track record there), and also with the understanding that XPoint products typically reach their maximum IOPS at Queue Depths far below 16, we can compare the theoretical figures for this new Optane part to the measured results from the two most recent NAND-based enterprise launches. To say the random performance makes leaves those parts in the dust is an understatement. 500,000+ IOPS is one thing, but doing so at lower QD's (where actual real-world enterprise usage actually sits) just makes this more of an embarrassment to NAND parts. The added latency of NAND translates to far higher/impractical QD's (256+) to reach their maximum ratings.
Intel research on typical Queue Depths seen in various enterprise workloads. Note that a lower latency device running the same workload will further 'shallow the queue', meaning even lower QD.
Another big deal in the enterprise is QoS. High IOPS and low latency are great, but where the rubber meets the road here is consistency. Enterprise tests measure this in varying degrees of "9's", which exponentially approach 100% of all IO latencies seen during a test run. The plot method used below acts to 'zoom in' on the tail latency of these devices. While a given SSD might have very good average latency and IOPS, it's the outliers that lead to timeouts in time-critical applications, making tail latency an important item to detail.
I've taken some liberties in my approximations below the 99.999% point in these plots. Note that the spec sheet does claim typical latencies "<10us", which falls off to the left of the scale. Not only are the potential latencies great with Optane, the claimed consistency gains are even better. Translating what you see above, the highest percentile latency IOs of the P4800X should be 10x-100x (log scale above) faster than Intel's own SSD DC P3520. The P4800X should also easily beat the Micron 9100 MAX, even despite its IOPS being 5x higher than the P3520 at QD16. These lower latencies also mean we will have to add another decade to the low end of our Latency Percentile plots when we test these new products.
Well, there you have it. The cost/GB will naturally be higher for these new XPoint parts, but the expected performance improvements should make it well worth the additional cost for those who need blistering fast yet persistent storage.