Taking an Accurate Look at SSD Write Endurance

Last year, I posted a rebuttal to a paper describing the future of flash memory as ‘bleak’. The paper went through great (and convoluted) lengths to paint a tragic picture of flash memory endurance moving forward. Yesterday a newer paper hit Slashdotthis one doing just the opposite, and going as far as to assume production flash memory handling up to 1 Million erase cycles. You’d think that since I’m constantly pushing flash memory as a viable, reliable, and super-fast successor to Hard Disks (aka 'Spinning Rust'), that I’d just sit back on this one and let it fly. After all, it helps make my argument! Well, I can’t, because if there are errors published on a topic so important to me, it’s in the interest of journalistic integrity that I must now post an equal and opposite rebuttal to this one – even if it works against my case.

First I’m going to invite you to read through the paper in question. After doing so, I’m now going to pick it apart. Unfortunately I’m crunched for time today, so I’m going to reduce my dissertation into the form of some simple bulleted points:

  • Max data write speed did not take into account 8/10 encoding, meaning 6Gb/sec = 600MB/sec, not 750MB/sec.
  • The flash *page* size (8KB) and block sizes (2MB) chosen more closely resemble that of MLC parts (not SLC – see below for why this is important).
  • The paper makes no reference to Write Amplification.

Perhaps the most glaring and significant is that all of the formulas, while correct, fail to consider the most important factor when dealing with flash memory writes – Write Amplification.

Before geting into it, I'll reference the excellent graphic that Anand put in his SSD Relapse piece:

SSD controllers combine smaller writes into larger ones in an attempt to speed up the effective write speed. This falls flat once all flash blocks have been written to at least once. From that point forward, the SSD must play musical chairs with the data on each and every small write. In a bad case, a single 4KB write turns into a 2MB write. For that example, Write Amplification would be a factor of 500, meaning the flash memory is cycled at 500x the rate calculated in the paper. Sure that’s an extreme example, but the point is that without referencing amplification at all, it is assumed to be a factor of 1, which would only be the case if you were only writing 2MB blocks of data to the SSD. This is almost never the case, regardless of Operating System.

After posters on Slashdot called out the author on his assumptions of rated P/E cycles, he went back and added two links to justify his figures. The problem is that the first links to a 2005 data sheet for 90nm SLC flash. Samsung’s 90nm flash was 1Gb per die (128MB). The packages were available with up to 4 dies each, and scaling up to a typical 16-chip SSD, that only gives you an 8GB SSD. Not very practical. That’s not to say 100k is an inaccurate figure for SLC endurance. It’s just a really bad reference to use is all. Here's a better one from the Flash Memory Summit a couple of years back:

The second link was a 2008 PR blast from Micron, based on their proposed pushing of the 34nm process to its limits. “One Million Write Cycles” was nothing more than a tag line for an achievement accomplished in a lab under ideal conditions. That figure was never reached in anything you could actually buy in a SATA SSD. A better reference would be from that same presentation at the Summit:

This shows larger process nodes hitting even beyond 1 million cycles (given sufficient additional error bits used for error correction), but remember it has to be something that is available and in a usable capacity to be practical for real world use, and that’s just not the case for the flash in the above chart.

At the end of the day, manufacturers must balance cost, capacity, and longevity. This forces a push towards smaller processes (for more capacity per cost), with the limit being how much endurance they are willing to give up in the process. In the end they choose based on what the customer needs. Enterprise use leans towards SLC or eMLC, as they are willing to spend more for the gain in endurance. Typical PC users get standard MLC and now even TLC, which are *good enough* for that application. It's worth noting that most SSD failures are not due to burning out all of the available flash P/E cycles. The vast majority are due to infant mortality failures of the controller or even due to buggy firmware. I've never written enough to any single consumer SSD (in normal operation) to wear out all of the flash. The closest I've come to a flash-related failure was when I had an ioDrive fail during testing by excessive heat causing a solder pad to lift on one of the flash chips.

All of this said, I’d love to see a revisit to the author’s well-structured paper – only based on the corrected assumptions I’ve outlined above. *That* is the type of paper I would reference when attempting to make *accurate* arguments for SSD endurance.