Review Index:
Feedback

Triple M.2 Samsung 950 Pro Z170 PCIe NVMe RAID Tested - Why So Snappy?

Subject: Storage
Manufacturer: Gigabyte

Introduction

NVMe was a great thing to happen to SSDs. The per-IO reduction in latency and CPU overhead was more than welcome, as PCIe SSDs were previously using the antiquated AHCI protocol, which was a carryover from the SATA HDD days. With NVMe came additional required support in Operating Systems and UEFI BIOS implementations. We did some crazy experiments with arrays of these new devices, but we were initially limited by the lack of native hardware-level RAID support to tie multiple PCIe devices together. The launch of the Z170 chipset saw a remedy to this, by including the ability to tie as many as three PCIe SSDs behind a chipset-configured array. The recent C600 server chipset also saw the addition of RSTe capability, expanding this functionality to enterprise devices like the Intel SSD P3608, which was actually a pair of SSDs on a single PCB.

Most Z170 motherboards have come with one or two M.2 slots, meaning that enthusiasts wanting to employ the 3x PCIe RAID made possible by this new chipset would have to get creative with the use of interposer / adapter boards (or use a combination of PCI and U.2 connected Intel SSD 750s). With the Samsung 950 Pro available, as well as the slew of other M.2 SSDs we saw at CES 2016, it’s safe to say that U.2 is going to push back into the enterprise sector, leaving M.2 as the choice for consumer motherboards moving forward. It was therefore only a matter of time before a triple-M.2 motherboard was launched, and that just recently happened - Behold the Gigabyte Z170X-SOC Force!

View Full Size

This new motherboard sits at the high end of Gigabyte’s lineup, with a water-capable VRM cooler and other premium features. We will be passing this board onto Morry for a full review, but this piece will be focusing on one section in particular:

View Full Size

I have to hand it to Gigabyte for this functional and elegant design choice. The space between the required four full length PCIe slots makes it look like it was chosen to fit M.2 SSDs in-between them. I should also note that it would be possible to use three U.2 adapters linked to three U.2 Intel SSD 750s, but native M.2 devices makes for a significantly more compact and consumer friendly package.

View Full Size

With the test system set up, let’s get right into it, shall we?

Read on for our look at triple M.2 in action!


February 1, 2016 | 10:46 AM - Posted by nathanddrews

That's nice. Real nice.

February 1, 2016 | 10:55 AM - Posted by Anonymous (not verified)

Brilliant analysis on page 4, i'll be checking into this setup. Simply brilliant.

February 1, 2016 | 11:07 AM - Posted by Patrick3D (not verified)

$400 for a motherboard?! That's what I spend to build an entire system. I just don't see the value in it.

February 1, 2016 | 01:16 PM - Posted by Anonymous (not verified)

Horses for courses. This motherboard was never meant to target the market sector that can do everything they need with a $400 system.
Many of us want or need significantly more power and are prepared to pay for it.

February 1, 2016 | 01:18 PM - Posted by Hyperstrike (not verified)

$400 for a motherboard?! That's what I spend to build an entire system. I just don't see the value in it.

Try actually gaming on a high-end machine, or a machine that is running something that actually benefits from low-latency, high-IOP performance.

Yeah, that's going to require you to spend more than $400 for a whole system.

You're essentially trying to compare a skateboard to a supercar.

February 1, 2016 | 02:34 PM - Posted by jimecherry

On that note I can assemble a baked potato for under 5 dollars. Doesn't mean it'll run crisis on 7 vms hooked up to 7 freesync monitors. It might be able to run dota though ;}

February 1, 2016 | 03:57 PM - Posted by Allyn Malventano

Yeah those considering triple NVMe RAID are definitely in the serious power user category.

February 2, 2016 | 01:18 AM - Posted by Geek (not verified)

I'm not sure, since I can't find proper Xeon boards with enough channels, I've only been able to connect one 512gb 950Pro to my system with 25 256GB 850Pro drives to as many dedicated SAS lanes as I could manage (not nearly enough). That's hosting the storage for my workstation which is currently 44 Xeon cores with an additional 32 in transit now. The system has a little more than 800GB of RAM and uses dual 40Gb/sec Infiniband as a host bus and 10Gb/sec internet uplink.

Does this count as a power user? I didn't add any GPUs since I didn't have any need for them, but I'm considering adding a front-end device which hosts GPU as well.

I am using this configuration as a part-time data center for hosting labs for courses, but normally, I just use it for programming and compiling code and experimenting. I suspect I'll be up to around 2TB RAM and 200+ Xeon cores before 2017. My goal is to do it in a single rack with absolute resiliency. I have 4U sucked up with 52 3.5" hard drives though. They're big and ugly, but 400TB of SSD is still too expensive.

February 2, 2016 | 12:23 PM - Posted by Allyn Malventano

Oh that's certainly power user, but a different type of power user. Depending on the IOPS capabilities of the RAID cards you are using, this triple M.2 setup might be able to beat 25 SATA SSDs in some performance metrics.

February 5, 2016 | 03:00 AM - Posted by Anonymous (not verified)

This board is a high-end model. It looks like the main feature is a PLX chip which converts the 16 PCIe lanes from the CPU out to 32 PCIe lanes. This allows all 4 x16 slots to operate at x8 with 4 video cards installed. You still only get x16 bandwidth to the cpu though.

February 5, 2016 | 05:50 PM - Posted by MilanT (not verified)

How many GPUs can you use when all 3 m.2 slots are populated?

February 11, 2016 | 09:38 AM - Posted by Anonymous (not verified)

WEll, PLX isn't that great, its a stop gap until the Skylake-E's are out I wouldn't suggest it. Without it you would have 3 of these at 3.0 x4, so you would have x4 available for your video card. With the fast switching from the PLX you could do x16 and have x4 left over, but from every instance I Have dealt with PLX (mostly Z87 and Z97 boards) it is is not really even close to true 3.0 x16 performance. It only shows any returns when you have more than 4 cards installed. Personal opinion from experience, stay away from it. If you need the lanes go Haswell-E.

This of course was for shiz and giggles, to see what they could push it to, but if this was a "real" build I would only use 2 950 Pro's and the GPU at x8 with a non-PLX board (if they have any with 2 M.2 slots).

The difference between x8 and x16 is CURRENTLY negligible for gaming, might change with DX12.

February 1, 2016 | 11:58 AM - Posted by Jann5s

Thx guys, nice work!

@ Allyn, can you explain the difference between these two cases:
1) 2x 256Gb 950pro in raid 0
2) 1x 512Gb 950pro

Both cases have the same amount of memory chips to distribute the load over, but in the raid case, you have twice the controller, is this the advantage?

February 1, 2016 | 03:55 PM - Posted by Allyn Malventano

Reads will see the same type of boost as with 2x 512's. Writes will see the same effect / proportion scaling up from one to two 256's, but since the 256GB model has lower write performance to start with, two 256s will not beat two 512s.

Even with the slower write speed of the 256GB model, a pair of 256s will still beat a single 512 in all but low QD (1-2) latency. Everything else will be better - higher sequential writes (~1.5x) and reads (2x), higher random performance at moderate QD, etc.

February 1, 2016 | 04:51 PM - Posted by Jann5s

Interesting, the real question is why though.

Isn't an ssd controller not similar to a raid controller?

I was thinking a single 512 would be able to distribute the load similarly as 2x256 would in raid 0. I think that is true when only considering the memory chips. So where does the extra performance of the 2x256 come from?

February 1, 2016 | 09:54 PM - Posted by Anonymous (not verified)

The SSD controller will have a fixed number of channels. The 512 GB model just has twice the amount of memory attached to each channel. I believe Intel SSD controllers use 18 channels. I am not sure how many the Samsung controller uses. They wouldn't want to set up the controller to use half the number of channels with the 256 GB model since it would be effectively half the performance. You are not distributing across individual flash die, you are distributing across the channels of the controller. Twice the amount of flash die doesn't mean twice the performance. Double the number of channels can double the bandwidth though, if there is no bottleneck elsewhere.

February 5, 2016 | 03:02 AM - Posted by Anonymous (not verified)

There does seem to be an effect on write with more flash die, even with the same number of channels. I don't know exactly how this works.

February 1, 2016 | 11:50 AM - Posted by Anonymous (not verified)

What are the *practical* applications for this other than RAID1 for drive redundancy?

NVMe's increased in bandwidth has little to no tangible benefits over SATA-based SSDs. The only real benefit is high IOPS for database interactions.

Can we get application load times (OS, productivity and games), processing time analysis on video rendering, file compression/decompression, file copy, etc.?

When can we expect real-world benchmarks?

February 1, 2016 | 01:57 PM - Posted by Allyn Malventano

RAID-1 is certainly doable for a pair of drives, and RAID-5 would be the more efficient choice for three (we talk about that part on page 3).

The effect of the reduced latency is faster response with a lot going on on the system (heavy loading, multiple apps hitting the array simultanrously). There is no existing consistent test using actual simultaneous launching of applications, so the closest we can come is with the testing we are conducting here. The reader will have to decide, based on their particular demand on their storage, how high they will be filling the queue (this can be monitored in Windows), if the reduction in latency is of benefit to them.

I do have an extension of this testing that will also evaluate as the SSD is filled and TRIMmed, but for now the setup was random access to an 8GB span of a full SSD / array.

February 1, 2016 | 03:56 PM - Posted by mindthegap (not verified)

For someone building a computer that'll mainly be used for gaming plus the usual everyday use scenarios, would the addition of a 950 pro provide a noticeably faster experience compared to a SATA ssd such as the 850 Evo or Pro?

February 2, 2016 | 10:41 AM - Posted by Anonymous (not verified)

Other than the increased performance over synthetic benchmarks, you won't see any discernible and tangible differences. I bought the 512GB 950 Pro NVMe drive to replace my 500GB 850 EVO as my boot drive and Windows 10 and all my applications load just as fast. Loading games such as BF4, SWBF, WoWS w/ tons of mods, and anything in my Steam library loads about 0.2~0.8 seconds faster with the 950 Pro.

Is it work the extra cost? YMMV, but for me it was not.

I eventually put that 950 Pro to test against a 480GB Seagate 600 into my web and database server and found it to be worth it in there with the much lower latency on DB queries and the ability to have more concurrent connections.

I think what Allyn and Ryan need to say out right and not have the assumption that readers will just figure out is that we are up against the laws of diminishing returns. Meaning, as SSDs get faster and faster with different NAND types, controllers and protocols like NVMe, we, as consumers, will start seeing less and less benefits. So what if you can shave off a few fractions of a second off loading your OS or an application? I am waiting to see what XPoint has to offer since it is magnitudes faster than current SSD technology. Perhaps it will usher in a newer performance benchmark. Or be a victim of diminishing returns...

February 1, 2016 | 09:02 PM - Posted by Martin Sleeman (not verified)

To echo the other contributors, will this shave a few seconds off load times?

February 2, 2016 | 10:41 AM - Posted by Anonymous (not verified)

Perhaps a few fractions of a second.

February 1, 2016 | 12:33 PM - Posted by Jann5s

why does one ssd appear to be upside-down?

February 1, 2016 | 12:50 PM - Posted by Ryan Shrout

That was our first sample unit, didn't have the full retail sticker on it.

February 1, 2016 | 01:03 PM - Posted by Ed (not verified)

The space between the required four full length PCIe slots makes it look like it was chosen to fir M.2 SSDs in-between them

Typo there, I think u mean "fit"

February 1, 2016 | 01:23 PM - Posted by Ryan Shrout

Fixed, thanks!

February 1, 2016 | 01:46 PM - Posted by Jemma (not verified)

You messed up your graphs - you labeled the x-axis as nanoseconds, when it should be microseconds. Is the 6ns RAID overhead meant to actually be 6 microseconds?

February 1, 2016 | 01:58 PM - Posted by Allyn Malventano

Crap, you're right! Corrections incoming! Thanks for the catch!

February 1, 2016 | 04:52 PM - Posted by Jemma (not verified)

Cool, thanks! Looks real good.

February 1, 2016 | 01:48 PM - Posted by funandjam

Clearly not enough PCIe and m.2 ports.

Joking aside, it seems kinda weird to be able to save physical space on a board that big which would most likely be put in a roomy case. Maybe because it is such an expensive MB meant for high end enthusiasts looking for options on builds and/or mods?
Otherwise, very interesting review, great write up!

February 1, 2016 | 06:26 PM - Posted by Scuba Steve (not verified)

One of the biggest benefits to M.2 in desktop computers IMO is that the mobo delivers the power. If you only use M.2 storage, that's one less power cable you need, and less clutter.

February 1, 2016 | 01:49 PM - Posted by Brad (not verified)

Perhaps this is a silly question, but are the log scales for the graphs on page 4 labeled accurately? The scales jump from nano-scale (1e-9) to milli-scale (1e-3), but shouldn't the micro-scale (1e-6) be included in-between? If true, this would make the latency time reduction of running 3 drives in RAID only 1 order of magnitude instead of the 2-3 orders shown above.

Regardless, excellent analysis guys!

February 1, 2016 | 02:07 PM - Posted by Allyn Malventano

Yup, thanks for the catch, updating things now!

February 1, 2016 | 02:03 PM - Posted by Anonymous (not verified)

Why is this prrsented as new, the ASRock FATAL1TY Z170 PROFESSIONAL GAMING I7 also has 3 m.2 slots.

February 1, 2016 | 02:08 PM - Posted by Allyn Malventano

This was the first board we could get in capable of triple M.2.

February 1, 2016 | 02:13 PM - Posted by Scott Brickey (not verified)

would be curious to see how QD and latency compare when running in RAID1 or RAID5 configurations.

February 1, 2016 | 02:17 PM - Posted by Allyn Malventano

Roughly:

RAID-1 (2 SSDs): Reads are similar to RAID-0. Writes are similar to single SSD.

RAID-5 (3 SSDs): Reads are ~ 2-3 SSD RAID-0 figures, writes are ~ 1-2 SSD RAID-0 figures.

Be advised there is additional CPU overhead in RAID-5 due to parity calcs.

February 1, 2016 | 07:00 PM - Posted by Anonymous (not verified)

Re: RAID1 reads are similar to RAID0... I'm surprised that it's not closer to single SSD... sounds like no integrity / validation among both disks?

February 2, 2016 | 01:13 AM - Posted by Anonymous (not verified)

A common misconception. RAID1 typically does not read data from both drives and then compare to see if they are the same, it will divide the reads across both drives and use the sector CRC's to ensure data integrity and only then will it switch to reading the other drive for the bad sector(s).

February 2, 2016 | 12:28 PM - Posted by Allyn Malventano

RAID-1 typically reads back data in 'performance' mode, meaning it stripes across the drives as if they were in RAID-0. No error checking happens here, but you can tell RST to 'Verify' the array, which will scrub both drivers front to back and compare data.

February 1, 2016 | 03:44 PM - Posted by Anonymous Coward (not verified)

Those are some pretty impressive iops numbers. This seems an ideal setup for a high core count, write intensive OLTP database system.

I'm wondering about the DMI bottleneck though. I understand why putting the SSDs behind the chipset allows for UEFI-level RAID configuration. However, say that you don't want to use Intel RST, but instead rely on Linux' MD-RAID or Solaris' ZFS, then it would be better to have the m.2's wired directly to the CPU, no ? Then again, the question then becomes *where* you're going to get data to and from at a sufficient pace to keep that SSD array busy enough on a consumer level system like Z170.

Interesting article. Thank you very much for taking the time to document and share your findings.

February 1, 2016 | 03:51 PM - Posted by Allyn Malventano

Remember we were only writing randomly to an 8GB span of sequentially filled SSDs here. OLTP would randomly write to a much larger span of the SSD (if not all of it), so to get good sustained random write performance you will need enterprise SSDs which can better handle sustained workloads to 100% of the volume.

(The latency principles still apply though).

February 1, 2016 | 03:52 PM - Posted by zMeul (not verified)

the simplest explanation would be comparing it to multithreading

February 1, 2016 | 06:30 PM - Posted by Jimmy (not verified)

Moar IOPS = Lower Latency. Simple math.

February 2, 2016 | 12:30 PM - Posted by Allyn Malventano

This totally does *not* apply when a queue is involved. For example, the OCZ R4 hit very high IOPS, but used SandForce SSD controllers in a RAID to get there, so individual IO latency was far higher than what we are seeing here.

May 14, 2016 | 09:34 PM - Posted by Ed (not verified)

Allyn: You put in a ton of work on this. Thank you for sharing! My wife has been griping her computer is slow and I always get the "good" stuff for myself, which is absolutely true, lol. So I was looking at all the latest technology, and was really wondering about the RAID 5 aspect with the 3 each M.2 connectors. You answered the questions I had. I have been using RAID 5 exclusively for many years, and my wife's old computer (~8 yrs) has a 1.5 TB raid 5 C: drive, which always has to rebuild if the system locks down, which can take a day or more. Raid 5 still works, of course, but slows down considerably when rebuilding. So, I am a little paranoid about using raid 5 for the C: drive. I use a single SSD C: drive and a 3TB raid 5 D: on my own computer. Your comments about loading the system using a GPT external USB drive are crucial. I obviously am rusty on the latest bios settings terminology, but I have built my own computers for the last 20 years, one every 5 years with the latest stuff, so there is always a learning curve since I do it so seldom and technology changes.
Your article helps a Lot. Thank You!

February 1, 2016 | 06:35 PM - Posted by ryanbush81

Great write-up! It was hard to get through some of the technical details but Allyn promised the next page was going to be amazing. I was expecting a free computer offer or something. For real though amazing details. Really excited about my next build!

February 1, 2016 | 07:19 PM - Posted by djotter

Great write up and interesting findings. But could future videos have higher depth of field? Only the background TV is in focus.

February 2, 2016 | 12:32 PM - Posted by Allyn Malventano

We were trying a new camera for the video and we might not have had all settings tweaked properly.

February 1, 2016 | 08:43 PM - Posted by Josh Durston (not verified)

Wow, amazing storage review!
I'm trying to decide whether to go 850 or 950 mSATA single 250gb. Three way RAID 950's is a different universe of performance. Love the new latency visuals. Ryan, should let Allyn keep this setup (make that a Patreon theshold)

February 2, 2016 | 12:33 PM - Posted by Allyn Malventano

No Patreon needed on this one. After putting weeks of development work into creating this testing, I'll be using it on all storage reviews moving forward.

(that said, please consider contributing anyway!)

February 1, 2016 | 08:51 PM - Posted by StephenSM (not verified)

Perhaps slightly out of context for this article, but can anyone comment on how this config would affect an SLI installation? I believe that 3 M.2's and multiple graphics cards will take up more PCIE lanes that are available in the Skylake architecture.

so basically your SLI cards would be forced slower? Which would have priority to the PCIE lanes or is it all multiplexed somehow?

February 1, 2016 | 09:32 PM - Posted by Anonymous (not verified)

This uses PCIe from the chipset. You will lose all of the SATA ports off the chipset to do this. This will take PCIe 15 to 26 from the chipset. The lower PCIe links are still available for USB, network, other controllers, and probably last PCIe slot. The graphics cards would be running off the CPU PCIe lanes connected to the x16 slots. I don't know if this board supports 3-way CrossFire by using an x4 from the chipset. That would run into bandwidth limitations due to the link between the CPU and the chipset. It isn't really relevant anyway. I am not sure what applications you would be running at home to really stress this set-up at all. What you'd you be running to stress this set-up and your graphics system at the same time?

February 1, 2016 | 10:46 PM - Posted by Anonymous (not verified)

Actually, it is only 20 PCIe lanes from the chipset. The HSIO lanes 1 to 6 are USB3 only, while some of the 20 PCIe lanes can be switched to SATA. Using 3 x4 m.2 takes 12 lanes, leaving 8 lanes for other controllers or slots.

February 2, 2016 | 12:36 PM - Posted by Allyn Malventano

It may speed up some more complex game loads, but where this would really shine would be the home user that has other disk-heavy processes taking place *while* gaming on that same system.

February 1, 2016 | 10:31 PM - Posted by Anonymous (not verified)

"The end result result of this is a RAID of SSDs gives you a much greater chance of IOs being serviced as rapidly as possible, which accounts for that 'snappier' feeling experienced by veterans of SSD RAID."

You like writing "result" apparently.

February 2, 2016 | 12:37 PM - Posted by Allyn Malventano

Hah! I *do* like results! (fixed)

February 2, 2016 | 01:36 AM - Posted by Hakuren

While interesting as an exercise (I know only 3 ports) I find nothing short of sacrilege to suggest RAID5 setup on Flash based drives. If it was drivepool kind of setup then fine, but it's not. NVMe doing great in R1 or 10. There is no point wearing NAND with unnecessary parity writes (just like classic SSDs). Basically if you value your SSD all parity based Raid levels are out of the window. Even in enterprise environment SSD parity arrays are rarely encountered. And all of that with SSDs which cost 10-30x more than consumer grade drives. Simply 1 or 10 is much more convenient and easier&faster to recover. Time is money.

February 2, 2016 | 01:42 AM - Posted by Jan (not verified)

Wearing out ssd's is not that big an issue anymore, these vnand chips have excellent durability.

February 2, 2016 | 04:27 AM - Posted by lauri (not verified)

nice review,but i want to more real life test than grafs ans charts

February 2, 2016 | 07:25 AM - Posted by bigboy678 (not verified)

I too would be curious about some real life tests. (windows boot and shutdown, app startup, etc)

February 2, 2016 | 12:43 PM - Posted by Allyn Malventano

Boot of a 'clean' fresh install is essentially the same (or in some cases it takes a second or two longer due to different initialization of some BIOS when initializing NVMe devices during boot). Where the speed difference would be more seen is a 'well used' OS that has had a lot of other apps / startup processes / cruft generated over time. The additional SSDs would keep latency lower during the increased load seen during that boot. Still, we are talking a few seconds time, and that only happens while booting, which is a rare event (and why we don't focus on that aspect).

February 2, 2016 | 09:42 AM - Posted by Anonymous (not verified)

Why cant M2 slots be at right angles to the motherboard ?This would save space and allow better airflow and use less space. I could possibly see myself getting two m2 in raid0 at 120gb rather than a single 240gb.It would be interesting to see results of windows raid also, which has been flawless in my system (win7 ).

February 5, 2016 | 03:11 AM - Posted by Anonymous (not verified)

I think there was a few boards that did that; maybe some ASUS board if I remember right. It is a little more expensive since you need a metal bracket to support it.

February 2, 2016 | 10:44 AM - Posted by Anonymous (not verified)

For RAID 5, can you test with a LSI controller? I'm sure that the onboard RAID controller will be the bottleneck calculating the parity.

February 2, 2016 | 12:39 PM - Posted by Allyn Malventano

LSI RAID controllers handle SAS / SATA. These are PCIe NVMe.

February 2, 2016 | 01:07 PM - Posted by Oscar Castillo

The ASRock Z170 OC Formula and Extreme 7 both also have triple M.2 and are around half the price. They've been out for some time now. You should check them out to see if they have a same or worse RAID implementation.

February 5, 2016 | 03:17 AM - Posted by Anonymous (not verified)

The RAID implementation is in the Z170 chipset, so it should be exactly the same. It does have some hardware acceleration, but it doesn't seem to have hardware parity calculations. It would be cool it they could make a PCIe x16 RAID card that can handle 4 of these SSDs. No home user needs such a thing though.

This specific board is probably really expensive sinxe it has a PLX chip to convert the x16 PCIe connections from the CPU out to x32. This allows for 4-way SLI with x8 PCIe to all 4 slots.

February 3, 2016 | 10:42 AM - Posted by Anonymous (not verified)

The only "wise" usage to explain the price is in a rack used for real time financial transaction (read write databases)

February 3, 2016 | 12:27 PM - Posted by Allyn Malventano

Not really, as that type of use would be full span random writes, which would slow these consumer SSDs down considerably. You need enterprise optimized firmware for that.

February 3, 2016 | 06:16 PM - Posted by Ibrahim (not verified)

In the video, you mention that "advanced" users could see up to QD=8 . But, everything I've read says most users don't go past QD=2.

What exactly makes users go to higher queue depths? Like, is downloading a file + gaming going to increase your QD?

February 3, 2016 | 10:51 PM - Posted by Allyn Malventano

Each individual thread (program) that hits the storage will add *at least* one to the QD figure. Apps can individually ask for multiple sectors at the same time, or can 'ask ahead', which builds the queue. A simple windows file copy can run at QD=4 with nothing else going on. QD can spike past 64 on a powerful, multi-core system during boot where dozens of other apps and services are simultaneously launching. Note: SATA devices can't exceed QD=32, so if the OS climbs higher, the queue backs up into the OS itself, no additional benefit will come from a SATA SSD (since it can't see further ahead than the next 32 requests).

February 4, 2016 | 04:50 PM - Posted by chizow (not verified)

Great job as usual Allyn, excellent info and methodology. I have one of the 950 pros and had to use an Asus Hyper X4 riser card on my X99 because the native M.2 are worthless 10G slots and that's when it dawned on me that I wouldn't be able to do RAID with M.2 because the rest of my PCIe slots are occupied.

What we REALLY need are PCIe x16 riser cards that can support up to FOUR M.2 2280 cards for either RAID or JBOD, but the most important thing is it consolidates space and slots which is a problem now due to the way M.2 are routed with HSIO since they need to work individually or in tandem.

Do you have any info on this from industry OEMs? An x16 riser card that could take 4x M.2 would be awesome!

February 5, 2016 | 11:20 AM - Posted by MRFS (not verified)

> An x16 riser card that could take 4x M.2 would be awesome!

There certainly is a lot of engineering elegance to be had
with four M.2 @ x4 PCIe 3.0 lanes = x16 PCIe 3.0 lanes.

However, a PLX-type chip is required because
PCI-Express does not generally allow multiple
discrete devices in a single PCIe expansion slot.

HP and Dell have already developed same,
but the HP version requires an HP workstation.
For photos and discussion:

Google "Cheap NVMe performance from HP"

We published a WANT AD for same several months ago,
and one storage expert confirmed that h/w RAID
controllers are "works in progress" but
he was limited by an NDA and couldn't say
much more. To locate our WANT AD:

Google "NVMe RAID controller"

How about motherboards that replace SATA-Express ports
with 4 x U.2 ports? There's certainly enough room.

A factor to consider is the upstream bandwidth
of the DMI 3.0 link = 4.0 Gb/s (basically
x4 PCIe 3.0 lanes @ 8 GHz / 8.125 bits per byte).

As such, the upstream bandwidth of a single
NVMe M.2 connector is exactly the same
as the upstream bandwidth of the DMI 3.0 link.

It should be very interesting when Optane
(Intel 3D XPoint) non-volatile memory
becomes available in the M.2 form factor:
that development should create lots of pressure
to increase the upstream bandwidth to satisfy
that extra demand.

At the moment, barring any major changes in
Intel's latest chipsets, RST and RSTe
will only work DOWNSTREAM of the DMI 3.0 link:
RST does NOT work with the x16 lanes controlled
directly by any Intel CPUs, as far as I know.

Allyn, if you're reading this, could you
possibly confirm or update any of the above, please?
I would like to refine my understanding of these
issues, so as not to mislead anyone else.

GREAT REVIEW, once again!

Do keep up the brilliant work, Allyn.

You be the best, man :)

MRFS

February 5, 2016 | 05:36 PM - Posted by Allyn Malventano

To RAID NVMe devices the only current games in town are:

  • Z170 RST (bootable - has chipset support but bottlenecked by DMI 3.0)
  • RSTe (c600 / x99 - driver only, not bootable, no bottleneck but additional CPU overhead).

There is an HP Z Turbo Drive that supports up to 4x M.2, but it is current not clear (and unlikely) if it is performing hardware RAID of NVMe devices.

February 5, 2016 | 07:41 PM - Posted by MRFS (not verified)

FYI: here are the 3 add-in cards that
reportedly support 4 x M.2 SSDs
and an x16 edge connector:

HP Reveals New Z Turbo Drive Quad Pro
http://www.storagereview.com/hp_reveals_new_z_turbo_drive_quad_pro

The Dell 4x m.2 PCIe x16 version of the HP Z Turbo Quad Pro
http://www.servethehome.com/the-dell-4x-m-2-pcie-x16-version-of-the-hp-z...

Kingston Unveils E1000 NVMe Enterprise SSD At CES 2016
http://www.tomsitpro.com/articles/kingston-e1000-ssd-nvme-liqid,1-3098.html

Thanks again, Allyn.

MRFS

February 5, 2016 | 11:23 AM - Posted by MRFS (not verified)

correction:
DMI 3.0 link = 4.0 Gb/s
should be
DMI 3.0 link = ~4.0 GB/s (32 Gb/s / 8.125 bits per byte)

sorry for the typo

MRFS

February 5, 2016 | 11:25 AM - Posted by MRFS (not verified)

Kingston has one also:

http://www.tomsitpro.com/articles/kingston-e1000-ssd-nvme-liqid,1-3098.html

February 5, 2016 | 03:19 PM - Posted by MRFS (not verified)

Allyn, One more thing:

Is it possible to install a high-performance SAS RAID
controller in the primary PCIe 3.0 slot, and
have it communicate directly with the CPU
rather than via the DMI 3.0 link?

I for one would be very interested in seeing
if such a configuration can circumvent
the 4.0 GB/s bandwidth ceiling of the DMI 3.0 link.

In theory, a PCIe 3.0 x8 edge connector
should have twice the bandwidth ceiling of DMI 3.0
e.g. x8 PCIe 3.0 lanes @ 8G / 8.125 = ~8 GB/s.

If Intel's RSTe only works downstream of the DMI 3.0 link,
it seems that a 12 Gb/s SAS controller with 8 x 12G ports
should exceed the DMI ceiling e.g. by configuring a
RAID-0 with 12G SAS SSDs e.g. Toshiba PX04SL SSDs:

http://www.storagereview.com/toshiba_releases_new_readintensive_sas_ssd

Maybe Toshiba will lend PCPER some samples
so you can do a scaling experiment.

Now, what SAS RAID controller would be best,
Areca? LSI? ATTO? And, will such a controller
work in the primary PCIe 3.0 expansion slot?

Here's an Avago model at Newegg:

http://www.newegg.com/Product/Product.aspx?Item=N82E16816118217&Tpk=N82E...

MRFS

February 5, 2016 | 05:41 PM - Posted by Allyn Malventano

Yes, a PCIe x8 RAID card can exceed DMI 3.0 (4 lane) bandwidth, but you are adding a bunch of latency and a lower maximum cap on ultimate IOPS that the RAID controller can handle. Intel RST (SATA) actually beats most add-in RAID cards as far as IOPS scalability goes. It would also not be able to communicate to the host via NVMe, so there would be the same sort of IO overhead seen with SATA. It's basically the long / expensive way go reach those high figures.

February 5, 2016 | 07:35 PM - Posted by MRFS (not verified)

Many thanks for your very prompt replies above.

Keep up the great work, Allyn.

MRFS

February 5, 2016 | 07:47 PM - Posted by MRFS (not verified)

Re: HP Z Turbo Quad Pro reportedly has a "BIOS lock":

http://www.tomsitpro.com/articles/hp-reveals-turboz-quad-pro,1-3022.html

"Thanks to a BIOS lock, the device is supported only on the HP Z440, Z640, and Z840 Workstations, and cannot be used in any other OEM workstation solution."

February 6, 2016 | 05:37 PM - Posted by Anonymous (not verified)

Would the Windows 10 Pro USB flash drive work for OS installation with this configuration (is it GPT formatted)?

February 7, 2016 | 08:16 PM - Posted by MRFS (not verified)

FOUND! PLX Heaven:

http://www.servethehome.com/wp-content/uploads/2015/08/One-Stop-Systems-...

How about a workstation motherboard with
multiple U.2 ports, like this:

http://www.servethehome.com/wp-content/uploads/2015/08/A-Serial-Cables-A...

Here's the full article:

http://www.onestopsystems.com/blog-post/avago-and-plx-%E2%80%93-future-pcie

Avago + Malventano = exabytes per nanosecond! :)

MRFS

February 9, 2016 | 09:17 PM - Posted by MRFS (not verified)

Here's an add-on card with an x16 edge connector
and four U.2 ports:

http://www.serialcables.com/downloads/PCI-HBx16-I.pdf

I doubt that it supports hardware RAID, however:
note where it says "requires no additional software".

The company is called Serial Cables:

http://www.serialcables.com/

Here's the spec page for the
PCIe Gen3 Switch Board mentioned above:

http://www.serialcables.com/products.asp?cat=351&tier=264

Allyn, if you're still reading this, does that
Switch-based Host Adapter appear very similar to the two
made by Supermicro?

It sure would be nice to have a workstation motherboard
with four U.2 ports, just like that Host Adapter.

MRFS

February 9, 2016 | 09:24 PM - Posted by MRFS (not verified)

... replacing 2 x SATA-Express ports,
like this:

http://supremelaw.org/systems/nvme/4xU.2.and.SATA-E.jpg

I tried to maintain the same scale:
if so, there's plenty of room for 3 more U.2 ports
if we remove the SATA-Express ports.

MRFS

February 12, 2016 | 02:35 PM - Posted by takeallmymoney (not verified)

So to answer a quick question,
with three M.2 SSDs installed, will my graphics card run at x8?
Also to clarify this, I will have two SATA ports available for any optical or HDD or SSD to add further on?

Thanks a bunch.

February 14, 2016 | 08:43 AM - Posted by takeallmymoney (not verified)

Got my answer from the comments of the Video.

February 18, 2016 | 10:43 AM - Posted by MRFS (not verified)

FYI: Icy Dock sent me this email ad yesterday;
because the source is obvious, I'm sharing this
email message under the "fair use" doctrine:

http://supremelaw.org/systems/icydock/ICY.DOCK.MB998SP-B.in.HP.Z840.htm

Those Areca controllers are pricey:

http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&DEPA=0&Order=B...

I couldn't find motherboard details for that
HP workstation: the Areca they used could not
have been downstream of the DMI link -- NOT
with the performance numbers they reported.

-OR-

Are those numbers merely an effect of the
large on-board cache?

MRFS

February 27, 2016 | 08:38 AM - Posted by IthacaDon (not verified)

I plan to build a high end gaming computer for iRacing.com use. It seems to me that 3 m.2's in Raid 0 are not going to increase my FPS, though it may help with load times from the sad's to memory for tracks and cars.

Also, isn't Raid 0 a bit unstable? If there is any sort of memory error on either SSD won't that lock up the OS?

Even 2 SSD's in Raid 0 doesn't appear to add a significant advantage for my system.

Great discussion, article and video. I learned a great deal.

Thanks!

March 25, 2016 | 11:27 AM - Posted by Randy O. (not verified)

Excellent coverage of triple RAID NVMEs..
I'm looking for a microATX board with dual M.2 slots for RAID.
Any available to your knowledge ?

Thx in advance

April 19, 2016 | 04:24 AM - Posted by Dusan (not verified)

Could this be used for VOD server? I was thinking of 3 pcie ssd cards with 1.5TB of total amount of space(raid 5), so it would need about 2TB of pcie SSD storage... And it would also need 2 gigabith ethernet cards...

May 1, 2016 | 01:44 PM - Posted by Anonymous (not verified)

I wonder how this will fare against the upcoming Intel Optane SSD.

Will it still take a good chance?

May 7, 2016 | 11:29 AM - Posted by Francis Houle (not verified)

I have a question for you guys. I bought 3 950 PRO M.2 SSD. I want the best performance in my PC so which motherboard and processor should I buy. What is the best option for me? I plan to install one or two graphics cards in SLI.
Your article is a few months old, so maybe there are better products available .

May 23, 2016 | 02:37 PM - Posted by Mark D Blair (not verified)

I worry about full data loss running on RAID0 if one of the SSDs fails.

Can you do RAID5 with the 3 m.2 slots populated, and do you have any performance #s?

Thanks :)

May 23, 2016 | 02:44 PM - Posted by Mark D Blair (not verified)

Nevermind,

I answered my own question by reading the full article :P

Thanks again :)

June 15, 2016 | 09:57 AM - Posted by D1RTYD1Z619

Hi Allyn
So in the video around 9:20 are you saying that the "snappiness" of adding more SSDs is similar to hyper threading for Intel processors? The workload gets spread out across all drives.

June 18, 2016 | 01:51 PM - Posted by Archie177

I set about building a PC for video production and the top priority was HDD speed in order to capture up to raw 4k. In my search I found this great review and it convinced me to go with this motherboard and (2) 950 SSDs. I put all my faith in RAID0 because I backup regularly and archive to RAID1. My downfall was that I also used the RAID0 as a boot drive. I am now re-installing windows for the 3rd time, but learned my lesson this time and using another SATA drive for my boot device.

The problem I discovered is the BIOS will decide to reset CSM to enabled which in turn disables Intel RST and breaks the RAID.

The first time was my doing when I installed a video card that did not support UEFI. The second time was after not using the PC for a week. After booting up, my RAID was marked as failed. Checking the BIOS and it decided again to reset CSM to enabled for what reason I do not know. The only option was to delete the RAID and create a new volume. This does not recover the drive contents, but I found a utility that let me recover the partitions and get my data, but it still was not bootable until a re-install.

Does anyone know of a way to lock down that CSM so it will not change on its own?

June 25, 2016 | 03:32 AM - Posted by dayoldy (not verified)

Archie177

Do you also do video editing? Timeline scrubbing and rendering can really use lots of I/O in both cache and source disks...

What do you think of the idea of configuring your system so that you're booting from a regular SATA SSD, pointing your NLE software cache to a RAID 0 of 2x M.2 950s, employing the third M.2 950 to hold all of the project source files (raw video, audio, and media, and finally, having 2x HDDs in Raid 0 to catch the transcoded video files?

Or... this is more straightforward: one m.2 for cache, one for source, and one for the target.

dayoldy.

July 2, 2016 | 09:22 PM - Posted by Archie177

I do video editing also. After my multiple RAID failures, I reconfigured my system to use one SATA as the boot drive, two M.2 as RAID0 for capturing and temporary working drives, then another SATA to move completed work to. I lost trust in the M.2 drives as RAID0 (more so in the BIOS), so nothing important stays on those drives prior to shutting down.

I did discover something interesting while working with different configurations. When I first installed the two M.2 drives, I put them in two adjacent slots. This defeated a majority of my SATA ports. I did some research about theshared hardware on the motherboard, then moved the drives to the outside M.2 slots. This gave me use of faster SATA ports.

June 27, 2016 | 09:53 AM - Posted by Marcus100868 (not verified)

I am using this motherboard with 3 950 pro's in raid 0 can i use 2 samsung 850 pro in raid 0 with the sata ports as well?

Marcus100868

June 28, 2016 | 08:36 PM - Posted by j. banchero (not verified)

Hi Allyn,

I'm trying to find the best (value) high power setup for 3D content creation. I'm thinking the i7-5820k is the best value and I want as fast as possible application startup/speed so it sounds like a couple 512G 950 Pros in RAID0 would be the best option (also considering RAM disk). I'll be using a GTX1080 as soon as the price settles down :).

Is this motherboard the best option for 2x 950Pros in RAID0? Am I gaining significant performance in application startup/speed with such a setup or is a single 950 Pro adequate?

Thanks!
j.

June 29, 2016 | 01:23 AM - Posted by Onkar (not verified)

asrock z170 oc formula has 3 m.2 slots and has everything except 4way sli and even will have left 4 sata3 ports after all m.2s as it has 2 asmedia controllers 2 more than this and if u want wifi it also has a place for those laptop wifi cards(both dont come with wifi by default) and biggest factor is it comes for $200~250 half of what this costs and it even supports tridentz 4300mhz ram this doesnt so i dont see the benefit of this board over asrock's

June 30, 2016 | 01:50 AM - Posted by j. banchero (not verified)

Thanks!! I'll check out the Asrock. Sounds like exactly what I'm after.

Do you know if setting up 2 512G Samsung 950 Pros in RAID0 is pretty straightforward with the Asrock? I'm not very knowledgeable about PCI lanes and all that so I'm hoping I just plug in the drives, configure the BIOS and I"m done!

Thanks much for the info!

Best,
j.

August 24, 2016 | 10:28 AM - Posted by VirtuallyFirstPerson (not verified)

Where I see this being useful is in a VMware lab, using the 3x m.2 cards to be the caching tier of an all flash vSAN nested on one box with extreme performance would be incredible.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.