Review Index:
Feedback

Intel Optane SSD 905P 960GB NVMe HHHL SSD Review - Bigger XPoint

Subject: Storage
Manufacturer: Intel

PC Perspective Custom SSD Test Suite Introduction

Back in late 2016, we implemented a radically new test methodology. I'd grown tired of making excuses for benchmarks not meshing well with some SSD controllers, and that matter was amplified significantly by recent SLC+TLC hybrid SSDs that can be very picky about their workloads and how they are applied. The complexity of these caching methods has effectively flipped the SSD testing ecosystem on its head. The vast majority of benchmarking software and test methodologies out there were developed based on non-hybrid SLC, MLC, or TLC SSDs. All of those types were very consistent once a given workload was applied to them for long enough to reach a steady state condition. Once an SSD was properly prepared for testing, it would give you the same results all day long. No so for these new hybrids. The dynamic nature of the various caching mechanisms at play wreaks havoc on modern tests. Even trace playback testing such as PCMark falter, as the playback of traces is typically done with idle gaps truncated to a smaller figure in the interest of accelerating the test. Caching SSDs rely on those same idle time gaps to flush their cache to higher capacity areas of their NAND. This mix up has resulted in products like the Intel SSD 600p, which bombed nearly all of the ‘legacy’ benchmarks yet did just fine once tested with a more realistic, spaced out workload.

To solve this, I needed a way to issue IO's to the SSD the same way that real-world scenarios do, and it needed to be in such a way that did not saturate the cache of hybrid SSDs. The answer, as it turned out, was staring me in the face.

View Full Size

Latency Percentile made its debut in October of 2015 (ironically, with the 950 PRO review), and those results have proven to be a gold mine that continues to yield nuggets as we mine the data even further. Weighing the results allowed us to better visualize and demonstrate stutter performance even when those stutters were small enough to be lost in more common tests that employ 1-second averages. Merged with a steady pacing of the IO stream, it can provide true Quality of Service comparisons between competing enterprise SSDs, as well as high-resolution industry-standard QoS of saturated workloads. Sub-second IO burst throughput rates of simultaneous mixed workloads can be determined by additional number crunching. It is this last part that is the key to the new test methodology.

The primary goal of this new test suite is to get the most accurate sampling of real-world SSD performance possible. This meant evaluating across more dimensions than any modern benchmark is capable of. Several thousand sample points are obtained, spanning various read/write mixes, queue depths, and even varying amounts of additional data stored on the SSD. To better quantify real-world performance of SSDs employing an SLC cache, many of the samples are obtained with a new method of intermittently bursting IO requests. Each of those thousands of samples is accompanied by per-IO latency distribution data, and a Latency Percentile is calculated (for those counting, we’re up to millions of data points now). The Latency Percentiles are in turn used to derive the true instantaneous throughput and/or IOPS for each respective data point. The bursts are repeated multiple times per sample, but each completes in less than a second, so even the per-second logging employed by some of the finer review sites out there just won’t cut it.

View Full Size

Would you like some data with your data? Believe it or not, this is a portion of an intermittent calculation step - the Latency Percentile data has already been significantly reduced by this stage.

Each of the many additional dimensions of data obtained by the suite is tempered by a weighting system. Analyzing trace captures of live systems revealed *very* low Queue Depth (QD) under even the most demanding power-user scenarios, which means some of these more realistic values are not going to turn in the same high queue depth ‘max’ figures seen in saturation testing. I’ve looked all over, and nothing outside of benchmarks maxes out the queue. Ever. The vast majority of applications never exceed QD=1, and most are not even capable of multi-threaded disk IO. Games typically allocate a single thread for background level loads. For the vast majority of scenarios, the only way to exceed QD=1 is to have multiple applications hitting the disk at the same time, but even then it is less likely that those multiple processes will be completely saturating a read or write thread simultaneously, meaning the SSD is *still* not exceeding QD=1 most of the time. I pushed a slower SATA SSD relatively hard, launching multiple apps simultaneously, trying downloads while launching large games, etc. IO trace captures performed during these operations revealed >98% of all disk IO falling within QD=4, with the majority at QD=1. Results from the new suite will contain a section showing a simple set of results that should very closely match the true real-world performance of the tested devices.

While the above pertains to random accesses, bulk file copies are a different story. To increase throughput, file copy routines typically employ some form of threaded buffering, but it’s not the type of buffering that you might think. I’ve observed copy operations running at QD=8 or in some cases QD=16 to a slower destination drive. The catch is that instead of running at a constant 8 or 16 simultaneous IO’s as you would see with a saturation benchmark, the operations repeatedly fill and empty the queue, meaning the queue is filled, allowed to empty, and only then filled again. This is not the same as a saturation benchmark, which would constantly add requests to meet the maximum specified depth. The resulting speeds are therefore not what you would see at QD=8, but actually, a mixture of all of the queue steps from one to eight.

Conditioning

Some manufacturers achieve unrealistic ‘max IOPS’ figures by running tests that place a small file on an otherwise empty drive, essentially testing in what is referred to fresh out of box (FOB) condition. This is entirely unrealistic, as even the relatively small number of files placed during an OS install is enough to drop performance considerably from the high figures seen with a FOB test.

On the flip side, when it comes to 4KB random tests, I disagree with tests that apply a random workload across the full span of the SSD. This is an enterprise-only workload that will never be seen in any sort of realistic client scenario. Even the heaviest power users are not going to hit every square inch of an SSD with random writes, and if they are, they should be investing in a datacenter SSD that is purpose-built for such a workload.

View Full Size

Calculation step showing full sweep of data taken at multiple amounts of fill.

So what’s the fairest preconditioning and testing scenario? I’ve spent the past several months working on that, and the conclusion I came to ended up matching Intel’s recommended client SSD conditioning pass, which is to completely fill the SSD sequentially, with the exception of an 8GB portion of the SSD meant solely for random access conditioning and tests. I add a bit of realism here by leaving ~16GB of space unallocated (even those with a full SSD will have *some* free space, after all). The randomly conditioned section only ever sees random, and the sequential section only ever sees sequential. This parallels the majority of real-world access. Registry hives, file tables, and other such areas typically see small random writes and small random reads. It’s fair to say that a given OS install ends up with ~8GB of such data. There are corner cases where files were randomly written and later sequentially read. Bittorrent is one example, but since those files are only laid down randomly on their first pass, background garbage collection should clean those up so that read performance will gradually shift towards sequential over time. Further, those writes are not as random as the more difficult workloads selected for our testing. I don't just fill the whole thing up right away though - I pause a few times along the way and resample *everything*, as you can see above.

View Full Size

Comparison of Saturated vs. Burst workloads applied to the Intel 600p. Note the write speeds match the rated speed of 560 MB/s when employing the Burst workload.

SSDs employing relatively slower TLC flash coupled with a faster SLC cache present problems for testing. Prolonged saturation tests that attempt to push the drive at full speeds for more than a few seconds will quickly fill the cache and result in some odd behavior depending on the cache implementation. Some SSDs pass all writes directly to the SLC even if that cache is full, resulting in a stuttery game of musical chairs as the controller scrambles, flushing SLC to TLC while still trying to accept additional writes from the host system. More refined implementations can put the cache on hold once full and simply shift incoming writes directly to the TLC. Some more complicated methods throw all of that away and dynamically change the modes of empty flash blocks or pages to whichever mode they deem appropriate. This method looks good on paper, but we’ve frequently seen it falter under heavier writes, where SLC areas must be cleared so those blocks can be flipped over to the higher capacity (yet slower) TLC mode. The new suite and Burst workloads give these SSDs adequate idle time to empty their cache, just as they would have in a typical system. 

Apologies for the wall of text. Now onto the show!


May 2, 2018 | 01:04 PM - Posted by Chaitanya (not verified)

Intel really needs to rethink Optane for desktop. Quite a bit of schizophrenia approach on marketing of this product to consumers.

May 2, 2018 | 11:30 PM - Posted by Paul A. Mitchell (not verified)

I have to disagree, if only because Intel did make a prior decision to limit its M.2 Optane SSDs to x2 PCIe 3.0 lanes. However, I believe I saw very recent reports that the newer M.2 Optane controller is smaller and also uses x4 PCIe 3.0 lanes. See, for example, Intel's "enterprise" M.2 Optanes, photographs of which have already started appearing on the Internet. As such, it's only a matter of time before future Intel M.2 Optanes come with larger capacities that are more compatible with desktop designs. Also, keep your eye on upcoming 2.5" U.2 Optane SSDs, because they will integrate quite naturally into the 2.5" bays available in billions of PC chassis. On that point, I was also very happy to see that Icy Dock is now manufacturing a 5.25" enclosure that houses 4 x 2.5" NVMe SSDs: https://www.newegg.com/Product/Product.aspx?Item=N82E16817994219&Tpk=N82...

May 2, 2018 | 11:35 PM - Posted by Paul A. Mitchell (not verified)

https://www.tomshardware.com/reviews/highpoint-ssd7120-raid,5509.html

"The performance with our four-drive Optane 900P array is spectacular: the array achieved over 11,000 MB/s at a queue depth (QD) of 16. At QD8, we measured sequential read performance at just over 8,000 MB/s."

May 2, 2018 | 11:41 PM - Posted by Paul A. Mitchell (not verified)

Photos of Enterprise M.2 Optane SSDs are here:

https://www.anandtech.com/show/12562/intel-previews-optane-enterprise-m2...

https://images.anandtech.com/doci/12562/imgp0710_678x452.jpg

May 2, 2018 | 01:28 PM - Posted by Anonymous### (not verified)

ahahahahahahah

LEDs before SPECs.

I bet this is marketed towards gaming intel consumers on sudoku watch. Which month of this year is this getting obsolete?

May 2, 2018 | 05:19 PM - Posted by Allyn Malventano

The LEDs are admittedly a bit silly, but this does remain the highest performing (all but sequential) and highest endurance client SSD available.

May 2, 2018 | 03:44 PM - Posted by Dark_wizzie

I'll wait until 960gb capacity is under $1000 used AND is in 2.5in form factor. So I'll be waiting a while.

May 2, 2018 | 03:54 PM - Posted by sircod

2.5in form factor for Optane? You mean with slow-ass SATA or non-existent U.2?

May 2, 2018 | 05:04 PM - Posted by Allyn Malventano

U.2 on desktop is a bit of an issue without cases that offer direct airflow across the bottom of the SSD (heatsink area). Drives that draw >10W in U.2 form factor will cook when left in stagnant air. That's if you even have the U.2 port in the first place. If not then you have to get creative with adapters...

May 2, 2018 | 10:12 PM - Posted by dstanding (not verified)

Well regarding the connector, it's pretty straightforward...either 8639 -> 8643, or 8639 -> 8643 -> M.2.

I can't really think of a platform on which NVMe would be reasonable which doesn't have at least an M.2 slot.

May 2, 2018 | 06:05 PM - Posted by dAvid (not verified)

can we have some real world testing please?

e.g. app loading, Windows boot times

May 3, 2018 | 10:10 AM - Posted by Anonymous2 (not verified)

Spoiler: it will be really fast, but will only "feel snappier".

May 3, 2018 | 05:42 PM - Posted by Allyn Malventano

You're at the point of diminishing returns over a 970 / 960 in boot times and most applications loads (see the mixed burst read service time results for that).

May 4, 2018 | 08:25 AM - Posted by luckz (not verified)

According to 900P owner forum posts, it makes things like Regedit search or loading icons instant, and in those regards is a visible improvement over high-end NVMe NAND SSDs.

On the other hand, I have no idea how one would benchmark that beyond the classic 4k random QD1/QD2.

June 7, 2018 | 05:05 AM - Posted by Allyn Malventano

4K random read at low QD is exactly how you test for that, and the Optane parts crush those particular tests. It's just that most typical software hasn't caught up to the potential just yet.

May 2, 2018 | 07:45 PM - Posted by pdjblum

gave up thinking that these ultra fast drives, including the 970 and 960 pro and evos, are worth all that extra cash when i was able to buy the micron 2TB ssd with endurance of 400TB for $318, .159/GB on amazon

i doubt i will notice the difference

here is the link for the pragmatic or poor or both:

https://www.amazon.com/dp/B01LB05YOO/?coliid=I122MGZXIHO222&colid=TB86CX...

hope this helps some of you

May 4, 2018 | 08:26 AM - Posted by luckz (not verified)

It was even on sale for $270 the other day (USA only Rakuten seller).

I have it, and on my ancient board it reaches pretty low 4K QD1/QD2 scores, maybe half of what a modern NVMe SSD would do on a modern board.

May 4, 2018 | 08:37 PM - Posted by pdjblum

maybe you are constrained by sata 2 given you have an old mobo?

that $270 is an insane price

the drive is bare bones, but i have read good things about it to date, and the endurance is solid

May 6, 2018 | 11:52 AM - Posted by luckz (not verified)

I do have SATA3, just not native.

If you check userbenchmark.com, the best 4K random read anyone got on the Micron 1100 is 26 MB/s (with an average of 20 MB/s). The average for a modern NVMe drive is 50 MB/s, while a 850 Evo manages 38 MB/s average. So it's a good drive for the price, but the competition is 50-100% faster where it matters.

May 2, 2018 | 09:55 PM - Posted by Takeshi7 (not verified)

Why did you only measure "burst" rates and not sustained random I/O? I bet the 905p would smoke the 970 in those cases. Especially if they are close to full.

This review seems very biased.

May 2, 2018 | 10:40 PM - Posted by Allyn Malventano

Well, if you care to read the performance focus page, you'll note that sustained *and* burst results are present, and for Optane SSDs (and non-caching NAND SSDs), the lines overlap, meaning sustained and burst performance are the same. Also, real client usage is not sustained. Does that give SLC caching NAND SSDs an advantage? Yes, but only compared with reviews that don't use realistic workloads, artificially *disadvantaging* those caching SSDs (sustained IO is not a realistic workload). We also don't test at the crazy high QD's that SSDs are typically rated at. Same reason.

Also, there's a whole page of this review dedicated to explaining why we test the way we do.

May 3, 2018 | 01:58 PM - Posted by Takeshi7 (not verified)

So basically you admit that you tested this drive in a way that it wasn't designed for. Intel's own material says it's meant for "high endurance" workloads. Tom's Hardware says "Intel bills the 905P as a workstation product designed to accelerate extended workloads."

burst workloads do not represent this use case, regardless of how "realistic" they are for the average consumer.

May 3, 2018 | 05:48 PM - Posted by Allyn Malventano

It's a high-end gamer-oriented SSD. With LEDs. The 900P (essentially the same product) ships with a free license to a game in the box. Intel's own documentation states it is for "desktop or client workstations". Additionally, workstation workloads operate at similar QD's to desktop, the difference being that workstations see those workloads at a higher frequency / for greater TBW, etc, and they do not see sustained operation at high QD's. Finally, with the burst and sustained results being equal, your assertion that I am testing it in a way it is not designed for is irrelevant (aside from also being false).

May 2, 2018 | 11:14 PM - Posted by Paul A. Mitchell (not verified)

Allyn's expert focus on latency needs to be appreciated together with the raw bandwidth that becomes available by using x16 PCIe slots, as opposed to connecting downstream of Intel's DMI 3.0 link.

From a research point of view, the availability of "bifurcated" x16 slots has now made possible options like the ASRock Ultra Quad M.2 card installed in an AMD Threadripper motherboard (and AICs like it).

(Hey, gals and guys, no need to "dangle the dongle"!)

Similar quad-M.2 add-in cards come withOUT an integrated RAID controller, because the RAID logic is performed directly by an available CPU core.

As such, designers can now choose to populate these add-in cards with M.2 SSDs and/or M.2-to-U.2 adapter cables.

So, picture this feasible setup: 4 x Samsung 970 EVO SSDs installed in an ASRock Ultra Quad M.2 AIC that is plugged into an x16 slot on the ASRock X399M micro-ATX motherboard.

Then, install 2 x Samsung 970 Pro SSDs in two of the three M.2 slots integrated on that same motherboard.

Lastly, sacrifice the third integrated M.2 slot by choosing instead a U.2 cable that connects directly to a U.2 Optane SSD.

If I had the money, I would be buying the required parts tomorrow. Alternatively, I would be shipping some of those parts directly to Allyn, so he could do his expert testing with other parts he already has in his lab.

The really good news is that ASRock Tech Support replied very promptly to my email request for the steps required to configure a RAID-0 array, using their Ultra Quad M.2 card and their X399M motherboard. I immediately forwarded ASRock's detailed instructions to Allyn.

Lastly, put all of the above in the visible future context of PCIe 4.0, which ups the transmission clock to 16 GHz.

What is truly amazing to me, about these recent developments, is that mass storage is now very close to performing at raw speeds comparable to DDR3 and DDR4 DRAM, and withOUT the volatility that comes with DRAM.

May 4, 2018 | 08:30 AM - Posted by luckz (not verified)

You never only gain performance with RAID, so whether it makes any sense at all relies on the workload.

Why M.2 => U.2 => Optane instead of just PCIe HHHL-ing it in?

May 7, 2018 | 03:44 PM - Posted by Paul A. Mitchell (not verified)

Here's one possible answer:
https://www.tomshardware.com/reviews/highpoint-ssd7120-raid,5509.html

May 3, 2018 | 11:37 AM - Posted by ben gods (not verified)

Almost 20 watts and that price.

gold award my azz,lol

NOW, i understand why that PCPerGoldPNG-300.png is looking

more Brownish than gold , Round rim , its has a light tongue.

are this s(h)ite giving gold awards to any mofo company
who approve of your existense.

May 3, 2018 | 05:53 PM - Posted by Allyn Malventano

If you care about 20W for a high-end SSD in a desktop chassis then this product is not for you. High-end GPUs idle at the same power draw that this SSD consumes fully loaded. Also, we dropped it down to gold specifically due to the price and called it out for that in the article multiple times. Maybe read some of those words instead of being so fixated on the award pictures, mmmk?

May 3, 2018 | 06:35 PM - Posted by Paul A. Mitchell (not verified)

I would like to come to Allyn's defense, using my own use case as an example:

First of all, we have really enjoyed the prolonged productivity we have experienced by loading our 14GB website image into a ramdisk. Simple tasks like browsing and indexing are noticeably faster, and they cause no wear on quality DRAM that has a lifetime warranty (we use a Corsair matched quad that cost >$700 brand new).

Here's the rub: the ramdisk software that we chose comes with a feature that SAVES and RESTORES the ramdisk contents during shutdown and startup. As our ramdisk has grown, both the SAVE and RESTORE tasks have naturally required more and more time to complete. This would not be a big deal, except for those days when we are required to RESTART, for one reason or another.

Accordingly, any routine RESTART must first SAVE the ramdisk's contents (as enabled), then the same routine RESTART must then RESTORE the ramdisk's contents from non-volatile storage. Thus, that reading and writing take place TWICE during every routine RESTART.

(Yes, I am aware that we can always disable that ramdisk, to accelerate RESTARTS; but then, re-loading the ramdisk takes a whole lot longer, using that approach.)

The non-volatile storage which SAVEs our ramdisk is reading about 1,900 MB/second. By switching to 4 x Samsung 970 EVO in RAID-0, the same task should be reading about 10,000 MB/second, or FIVE TIMES the speed of our current (aging) workstation. Similarly, 32GB of new DDR4-3200 should read about FOUR TIMES faster than the 16GB of DDR2-800 now in that aging workstation.

(Hey, gals and guys, I am TOTALLY aware that our DDR2-800 is obsolete, but that workstation continues to function perfectly, so why fix what ain't broke? :) As soon as I can afford the large incremental cost, I'll be building a brand new Threadripper workstation.

Hope this helps.

May 4, 2018 | 08:37 AM - Posted by luckz (not verified)

Just make sure to have an offsite (incremental) backup too :D
With 4x RAID-0 you bump the data loss risk up quite a bit.

Otherwise seems sound. Depending on whether you even need that much CPU performance and that many lanes, a consumer Intel chipset (you don't seem to seek ECC?) with integrated graphics could also host a ton of NVMes via its PCIe slots. The i5 8400 is a fraction of the cost of a Threadripper.

As for synchronising the RAM disk to HDD, periodic use of a tool like https://www.nongnu.org/rdiff-backup/ is also an option.

May 4, 2018 | 01:19 PM - Posted by Paul A. Mitchell (not verified)

Thanks!

Historically speaking, we have had almost zero problems
with several RAID-0 arrays built with 4 x 6G SSDs:
to date, we prefer Samsung and SanDisk wired to an
inexpensive add-in card.

To synchronize our ramdisk with our non-volatile storage,
we use a simple XCOPY sequence:

xcopy E:\folder R:\folder /s/e/v/d
xcopy R:\folder E:\folder /s/e/v/d

That does the job (if you don't mind Command Prompt).

Then, we backup E:\folder with another batch file
that copies updates over a LAN to older PCs.

Those older PCs act as storage servers that we power up
long enough to perform that task, then power them down.

Those storage servers are also an informal experiment
to measure just how long an obsolete PC will work,
with proper care, maintenance and UPS input power.

p.s. I don't usually need much of the discussion about
"random" and "sequential" workloads, for our purposes,
because routine tasks like updating a COPERNIC index
involve both modes of access. For that reason, we prefer
to have both kinds of storage, so that sequential tasks
like drive images can be done with fast sequential drives,
and random tasks can be done with fast random drives.

Our ramdisk software from www.superspeed.com has been
absolutely fantastic -- the effects on productivity
have been huge. Plus, all that computing using DRAM
has reduced wear on our other storage subsystems:

http://supremelaw.org/patents/SDC/RamDiskPlus.Review.pdf

May 4, 2018 | 10:30 PM - Posted by ConsumerGradeKitIsNotWorkstationReady (not verified)

Ha ha ha Threadripper for a workstation is a joke compared to Epyc/Sp3! Really gamers do not get it about what real workstations are all about and its not about some damn game running some crappy gaming graphics at some stupid FPS. professional Graphics Workstation user whant stability for their many hours long graphics rendering workloads and that's different from consumer/gaming SKUs like TR/X399 MB that are not really tested/certified and vetted for ECC Memory Usage.

Epyc is a Real server/workstation grade CPU/MB ecosystem and Threadripper dos not make the grade for real production workstation workloads.

Stop that madness all you enthusiasts websites with your affiliate code kickback schemes with the consumer marketing divisions of these companies. Trying to foist non Workstation grade hardware for the extra revenues ant the expense of the truth. Epyc is AMD's real server/workstation grade brandng and not any consumer Threadripper/Ryzen non professionally certified/tested and vetted for system stability and error free memory usage. Epyc is the better TRUE workstation price/feature winner against Intel and Against any other consumer/AMD gaming oriented hardware that does nt make the grade for actually professional workstation production workloads.

Threadripper even mintioned in the same article as Workstation is the very epitome of disingenuousness!
Real professionals use real Workstation hardware an AMD's Epyc SKUs are more affordable than Intel's Xeon and the better price/feature deal even compared to Threadripper.
AMD's not Intel so AMD's Real Epyc Workstation/Server Branded parts are so affordable that users are not forced to play at being professional and only able to afford Intel's non workstation grade consumer trash!

May 7, 2018 | 08:19 PM - Posted by Paul A. Mitchell (not verified)

If anyone is interested, ASRock replied to our query with simple instructions for doing a fresh install of Windows 10 to an ASRock Ultra Quad M.2 card installed in an AMD X399 motherboard. We uploaded that .pdf file to the Internet here:
http://supremelaw.org/systems/asrock/X399/

May 10, 2018 | 08:20 PM - Posted by Paul A. Mitchell (not verified)

FYI: comments on ASRock Ultra Quad M.2 AIC:

https://forums.servethehome.com/index.php?threads/quad-m-2-pcie-x16-nvme...

Reportedly, Intel's Enterprise M.2 Optane uses x4 PCIe lanes:
https://www.servethehome.com/new-intel-data-center-optane-m-2-ocp-summit...

"From talking to some of our hyper-scale data center contacts, we expect this new Optane m.2 drive to be PCIe x4 and significantly faster than the desktop drives. Perhaps given the DC P4510 and P4511 naming convention this will become the Intel DC P4801X or a new class of drives like a P4601X.

"Still, the continual march of the m.2 form factor in servers, even in the dense OCP server platforms, is ongoing. It is great to see that a proper Intel Optane DC drive is coming to the m.2 slot."

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.