Feedback

Intel CPU-attached NVMe RAID Uncovered on ASUS Z370 Motherboards

Author:
Subject: Storage
Manufacturer: ASUS

Is it a usable feature?

EDIT: We've received some clarification from Intel on this feature:

"The feature is actually apart of RST. While this is a CPU-attached storage feature, it is not VROC. VROC is a CPU-attached PCIe Storage component of the enterprise version of the product, Intel RSTe. VROC requires the new HW feature Intel Volume Management Device (Intel VMD) which is not available on the Z370 Chipset.

The Intel Rapid Storage Technology for CPU-attached Intel PCIe Storage feature is supported with select Intel chipsets and requires system manufacturer integration. Please contact the system manufacturer for a list of their supported platforms."

While this doesn't change how the feature works, or our testing, we wanted to clarify this point and have removed all references to VROC on Z370 in this review.

While updating our CPU testbeds for some upcoming testing, we came across an odd listing on the UEFI updates page for our ASUS ROG STRIX Z370-E motherboard.

View Full Size

From the notes, it appeared that the release from late April of this year enables VROC for the Z370 platform. Taking a look at the rest of ASUS' Z370 lineup, it appears that all of its models received a similar UEFI update mentioning VROC. EDIT: As it turns out, while these patch notes call this feature "VROC", it is officially known as "Intel Rapid Storage Technology for CPU-attached Intel PCIe Storage " and slightly different than VROC on other Intel platforms.

While we are familiar with VROC as a CPU-attached RAID technology for NVMe devices on the Intel X299 and Xeon Scalable platforms, it has never been mentioned as an available option for the enthusiast grade Z-series chipsets. Could this be a preview of a feature that Intel has planned to come for the upcoming Z390 chipset?

Potential advantages of a CPU-attached RAID mode on the Z370 platform mostly revolve around throughput. While the chipset raid mode on the Z370 chipset will support three drives, the total throughput is limited to just under 4GB/s by the DMI 3.0 link between the processor and chipset.

Like we've seen AMD do on their X470 platform, CPU-attached RAID should scale as long as you have CPU-connected PCI-Express lanes available, and not being used by another device like a GPU or network card.

First, some limitations.

Primarily, it's difficult to connect multiple NVMe devices to the CPU rather than the chipset on most Z370 motherboards. Since the platform natively supports NVMe RAID through the Z370 chipset, all of the M.2 slots on our Strix Z370-E are wired to go through the chipset connection rather than directly to the CPU's PCIe lanes.

View Full Size

To combat this, we turned to the ASUS Hyper M.2 X16 card, which utilizes PCIe bifurcation to enable usage of 4 M.2 devices via one PCI-E X16 slot. Luckily, ASUS has built support for bifurcation, and this Hyper M.2 card into the UEFI for the Strix Z370-E.

View Full Size

Aiming to simplify the setup, we are using the integrated UHD 620 graphics of the i7-8700K, and running the Hyper M.2 card in the primary PCIe slot, usually occupied by a discrete GPU.

Continue reading our look at CPU-attached NVMe RAID on Z370 motherboards from ASUS!

Next is a limitation that will seem familiar to anyone who has followed the VROC saga on X299 for the past year. As far as we can tell, CPU-attached NVMe RAID in its current state on Z370 will only work with Intel SSDs. In this case, we are using 32GB Optane Memory drives, but it should also be compatible with the higher capacity Optane 800P drives (and the newly announced Intel 905P M.2), as well as Intel's NVMe NAND offerings.

Lastly, the current implementation of CPU-attached NVMe RAID on Z370-based motherboards seems to be limited to two drives, as opposed to 3 drives for the chipset-based NVMe RAID. For most typical consumers, who would use a discrete GPU on a platform like this, it's mostly a moot point. In that scenario, you would run your GPU at PCI-e x8, and then two up to x4 SSDs in a RAID configuration.

Unlike VROC on the X299 chipset though, there are no hardware "dongles" to authenticate support and unlock additional features. Everything is available with the new UEFI updates.

Setup

The setup process for CPU-attached NVMe RAID on Z370 will be familiar to anyone who has ever set up a RAID volume and is the process for setting up the chipset-based RAID on the Strix Z370-E.

View Full Size

From the Intel Rapid Storage Technolgy menu in the UEFI interface, we can see both 32 GB Optane Memory drives, and our 512GB Samsung SSD 850 Pro boot drive.

View Full Size

In the Create Array menu, we can select our two 32GB Optane drives, change the RAID level (RAID 0 and 1 are both supported), stripe size, and other applicable settings.

View Full Size

Now we can see our 54.5GB RAID 0 volume has been created from the two NVMe SSDs connected through the CPU's PCI-express lanes.

Performance Comparison

One of the potential downsides we've seen with CPU-attached NVMe RAID, like in the AMD RAIDXprt2 software, was an increase in disk latency. Whereas the Intel chipset contains some fixed function hardware to help with RAID calculations, CPU-attached NVMe RAID is dependent solely on the CPU and Operating System for all RAID functions.

To evaluate CPU-attached NVMe RAID performance on the Z370 platform, we compared it to NVMe chipset raid on the same Z370 platform, as well as AMD's CPU-attached RAID solution on an X470 motherboard using the same 32GB Optane Memory modules.

Random Read Latency

View Full Size

Here we run into an unexpected result. Both single and double drive (RAID disabled and RAID enabled) results show lower latencies for the CPU-attached NVMe RAID setup. While it makes sense for the single drive connected directly to the CPU to be faster, we expected some additional latency for a RAID setup over the chipset-based option. 

Here, we can see that CPU-attached NVMe RAID is 10% lower latency than NVMe RAID through the Z370 chipset.

While single drive latency for X470 is very low, when another drive is added in a RAID 0 configuration, latencies almost double compared to CPU-attached NVMe RAID.

Random Read IOPS

View Full Size

Similarly, 4K random IOPS results show a 13% performance advantage to CPU-attached NVMe RAID versus the chipset-based NVMe RAID option on Z370.

While single drive IOPS for the X470 solution look great, adding a second drive and enabling RAID 0 results in a 32% performance decrease compared to CPU-attached NVMe RAID on the Z370 platform.

Sequential Read

View Full Size

In sequential transfers, however, we see nearly identical performance from all configurations, noted by the data points stacked on top of each other in this chart.

Unfortunately, given the distinct lack of NVMe M.2 devices with a full X4 interface from Intel (the 760P fits the bill, but we don't have any two drives of matching capacity), we are left without seeing much of an advantage to CPU-attached NVMe RAID on Z370. While latencies are a bit better, Optane is the only technology that is low latency enough to take advantage of such a small difference.

We would urge Intel to open this feature up to drives from all vendors so that we can see the advantage of two X4 PCIe SSDs in RAID not being bottlenecked by the chipset's DMI link.

View Full Size

Considering some of the bugs we've come across along the way, it doesn't quite seem to be ready for primetime usage. Mostly, this seems like a reactionary move to AMD's recent release of CPU-attached RAID features for the X470 platform. 

However, if motherboard manufacturers build their upcoming  Z390 platforms around the idea of CPU-attached NVMe RAID instead of chipset-based NVMe RAID, by routing the onboard M.2 slots directly to the CPU, then this could become an exciting proposition for enthusiast users.


June 8, 2018 | 12:15 PM - Posted by psuedonymous

Nice sleuthing!

"Lastly, the current implementation of VROC on Z370-based motherboards seems to be limited to two drives, as opposed to 3 drives for the chipset-based NVMe RAID."

This is likely a physical limitation of the CPU: Coffee Lake (as with Kaby Lake, Skylake, and all the way back to at least Haswell) has the following possible PCie lane assignments:
16x (+reverse)
8x/8x (+reverse)
8x/4x/4x (+reverse)

There is no capability for x4/x4/x4/x4, so x4/x4/x4 would rely on the card presenting itself with x8/x4/x4 with the correct lane alignments, or manually placing the m.2 drives such that lanes 4-7 (11-8 if reverse) from the CPU are unpopulated (which is tricky without knowing the pinouts the board is using and how the m.2 slots on the riser card are arranged relative to the card-edge (THAT at least can be verified with a continuity meter).

This lane assignment issue is probably the barrier to 'unlocking' VROC on existing boards. If Intel produce a new spec for Z390 (or for 4xx series and beyond) which specifies which lanes should be assigned to which slots, then CPU-connected drives would be a more viable option for a consumer unwilling to probe LGA pins or try trial-and-error drive layouts.

June 8, 2018 | 12:37 PM - Posted by Paul A. Mitchell (not verified)

ASUS should begin to offer x4/x4/x4/x4 functionality with their DIMM.2 socket. Being positioned directly adjacent to other standard DIMM slots, that socket really wants to drive 4 x NVMe M.2 SSDs in all modern RAID modes.

Likewise, the primary x16 PCIe socket should support the same "4x4" functionality (to borrow ASRock's nomenclature).

We've had multiple high-performance video cards for several years. It's about time that the full potential of PCI-Express be made available as standard features.

This policy will become much more important when PCIe 4.0's 16 GHz clock becomes standard fare.

June 11, 2018 | 08:22 AM - Posted by psuedonymous

'DIMM.2' (and similar risers like the one on the ASRock X299 ITX) are cost-saving measures, re-using an existing physical slot for a different interface. x4/x4/x4/x4 would require a PLX chip to split the lanes rather than relying on the CPU's bifurcation ability, adding a massive extra chunk of cost (both in terms of the PLX chip itself, and all the extra QC needed when a PLX chip is involved).

June 8, 2018 | 12:22 PM - Posted by Paul A. Mitchell (not verified)

Thanks, Ken.

I'm watching this "4x4" technology very closely, chiefly because of the engineering elegance that obtains with 4 x NVMe SSDs @ x4 PCIe 3.0 lanes.

With the proliferation of multi-core CPUs, an idle CPU core can now do the same or similar work previously done by a dedicated IOP on an hardware RAID card.

Last April, I also itemized some of the reasons why I believe the ASRock 4x4 card is better designed, here:

https://forums.servethehome.com/index.php?threads/quad-m-2-pcie-x16-nvme...

Many thanks to pcper.com for keeping us up-to-date with progress in this area of high-performance storage.

June 8, 2018 | 03:56 PM - Posted by GadgetBlues (not verified)

The most interesting part of this is that it means Coffee Lake contains the Intel Volume Management Device (VMD) hardware.

When Intel launched VROC, they pointed out that VMD was a hardware feature that was only present in Scalable Xeon (socket 3647) and Core i9 Skylake-X (socket 2066). Kaby Lake-X on socket 2066 was specifically not supported because those CPUs (now discontinued) did not contain a VMD.

I just searched through the Intel 8th-gen (Coffee Lake) datasheets as well as the Specification Update, and there is absolutely no mention of VMD. So in keeping with the general vague nature of VROC since its soft-launch, VMD support on Coffee Lake is undocumented.

June 9, 2018 | 03:22 PM - Posted by Allyn Malventano

They say it works on this platform is very likely *not* via VMD. The SSDs present themselves to the OS individually even with an array created (if the driver is not installed). The RST driver does the merging, and there is an apparent UEFI handoff that occurs during boot, keeping the array available throughout the boot process.

June 9, 2018 | 03:42 PM - Posted by asdff (not verified)

is it possible to update the charts with additional testing? such as random/sequential writes iops with single/raid config?

June 12, 2018 | 03:37 AM - Posted by Allyn Malventano

Not sure what you mean - single SSD results are present on the charts.

June 11, 2018 | 07:52 AM - Posted by psuedonymous

Sounds like it's just regular old Software RAID, that happens to be using drives on the PCU lanes rather than the PCH lanes.

June 9, 2018 | 06:23 PM - Posted by Jann5s

Allyn, some figures read “queue depth” on the bottom for the 1x and 2x columns, but I think it should be “drives” right?

June 12, 2018 | 03:41 AM - Posted by Allyn Malventano

Yup. Good catch. I'll try and fix this one when I've got some better internet here...

June 9, 2018 | 11:58 PM - Posted by Paul A. Mitchell (not verified)

Allyn,

On certain measurements of random IOPs,
is it possible for a logical record
to be smaller than a stripe size,
requiring that entire logical record
to be written to only one RAID-0 member?

If this is happening the way I am
visualizing it in my mind's eye,
we should not be surprised that
a RAID-0 array performs no faster
than a single NVMe SSD, under those
conditions.

Let me illustrate with perfect conditions:
a logical record is 16KB, and
a stripe size if 4KB. Given those factors,
one-fourth of each logical record
will be written to each member of
a 4-member RAID-0 array.

Now, reverse those numbers:
a logical record is 4KB, and
a stripe size is 16KB. Given those factors,
the entire logical record will be
written to one and only one member
of that RAID-0 array.

Do you see anything wrong with the
analysis above?

Thanks!

June 12, 2018 | 03:40 AM - Posted by Allyn Malventano

If you do a 4KB write to a 128KB striped RAID-0, the controller / chipset will typically only do the 4KB write. You don't have to 'round up' to the full stripe unless you are using parity-based RAID (RAID-5/6), but even then, it's sometimes possible to still only do the 4KB write (though that 4KB will have to be written across the entire stripe and not to just one drive).

July 13, 2018 | 01:47 AM - Posted by Paul A. Mitchell (not verified)

FYI: 4 x Intel Optane 905P NVME SSDs
installed and running in ASRock Ultra Quad M.2 AIC:

https://www.youtube.com/watch?v=mCRXPVSmPOA

"INSANE Storage Speeds from ASRock!"

w/ ASRock X299 XE motherboard

I believe this is the Newegg product page for that motherboard:

https://www.newegg.com/Product/Product.aspx?Item=N82E16813157798&Tpk=N82...

The measurements are what we would expect
when each NVMe SSD uses x4 PCIe 3.0 lanes --
as compared to prior Optane M.2 SSDS
which only use x2 PCIe 3.0 lanes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.