Review Index:

Intel SSD 910 Series 800GB PCIe SSD First Look

Subject: Storage
Manufacturer: Intel SSD 900 Family
Tagged: ssd, pcie, Intel, 910, 800gb

Testbed and Preliminary Benchmark Results

When testing high IOPS devices like the SSD 910 Series, it's wise to not clutter up your testbed with other devices that might interfere with the system busses, particularly the PCIe buswork. For our testbed we go the 'Keep It Simple, Stupid' route:

View Full Size

That's right, the only PCIe slot is taken up by the card under test. We're using SandyBridge integrated graphics which saves all possible bandwidth for the SSD 910 Series. Here's the rest of our testbed setup for this review:

PC Perspective would like to thank ASUS, Corsair, and Kingston for supplying some of the components of our test rig.

Hard Drive Test System Setup
CPU Intel Core i5-2500K
Motherboard Asus P8Z68-V Pro
Memory Kingston HyperX 4GB DDR3-2133 CL9
Hard Drive G.Skill 32GB SLC SSD
Sound Card N/A
Video Card Intel® HD Graphics 3000
Video Drivers Intel
Power Supply Corsair CMPSU-650TX
DirectX Version DX9.0c
Operating System Windows 7 X64


Preliminary Benchmark Results:

Here are some rapid fire quick tests. The first is ATTO, but with a rather large caveat. Combining the four SCSI LUNs within Windows means Dynamic Disk (software) RAID. There is no way to configure stripe size, so we are stuck with the hard-coded Windows 64k stripes here. That's not the best way to go for 4k workloads, and it tends to hold the 910 back a bit at the lower transfer sizes:

View Full Size

ATTO run of the full 800GB 910 SSD in "Performance Mode"

While sequential rates (near the bottom) are just above the 1.5GB/s write rating and just below the 2GB/s read rating, the 4k to 64k region (middle) appears lower than normal for a PCIe SSD. This is the result of attempting to tie the four LUNs together with Windows RAID. To demonstrate what I meant by penalties of the larger stripe size, here's an ATTO run of just one of the 200GB LUNs:

View Full Size

ATTO run of a single 200GB SCSI LUN (also in "Performance Mode")

Note how the performance quickly 'ramps up' at the lower transfer sizes when Windows RAID is not busy getting in the way. Also, from both ATTO runs we can tell the Hitachi-based controllers absolutely hate writes that are less than 4k in size, and by 'hate' I mean 'slower than a Hard Disk Drive'.

Now for some quick IOMeter results. I'll stick with 4k random since ATTO pretty much confirmed the sequential transfer speeds for us already. Note that the below table represents pre-conditioned but still fresh out of the box values (i.e. the drive was only sequentially written to capacity as to allocate all abailable LBAs):

# of LUNs 4k 100% Read 4k 67/33 R/W 4k 100% Write
1 (200GB) QD=32 49519 34847 64279
2 (400GB) QD=64 99139 69575 127556
3 (600GB) QD=96 148574 105322 191832
4 (800GB) QD=128 198160 139707 254789

IOMeter 'fresh' IOPS values for varying configurations ("Performance Mode").
QD=32 per LUN

FIrst off, the above figures demonstrate that the LSI HBA can absolutely handle SSD IOPS and scale properly with the addition of multiple SAS devices. Here we see a nice linear increase as we tack on additional LUNs for the workloads. This clean scaling was only possible by having IOMeter directly access each of the four devices simultaneously, bypassing any sort of Windows RAID.

Mixed workloads (where we are reading and writing simultaneously) saw less performance than reading or writing individually, but a slight dip here is somewhat expected. We will look into that aspect further with our next piece covering the 910 Series.

Finally, realize that the above write figures were for an unfragmented state. This is what you'll see for the first few hours of heavy use. The SSD Review did a similar test to the bottom right (*edit - lower right - see below for more details*) corner of my chart above, and reported 228k where I saw 254k, but realize those figures (theirs and mine above) don't represent a steady-state condition. I steamed ahead a bit further to get to a long-term figure for 4k random writes. Here it is:

View Full Size

Just over 83k random 4k write IOPS, and it took quite a while to get there, too. This figure is comfortably above Intel's stated spec of 75k, and it's impressive when you consider it is the result of a *continuous* 4k random write. Combined with Intel rating their HET-MLC flash for 10x its capacity each day for five full years, this is definitely a serious SSD!

*edit continued*:

As pointed out by a few in the comments (thanks!), I was reading The SSD Review's piece incorrectly. Sincere apologies for my 3AM fumble there. The SSD Review's pic showed 4k random reads at QD=64 per device, which was higher than the 32 value I chose for my testing. It did bring up a good question though - was it the higher QD and/or their choice of an 8GB LBA sweep that resulted in their figure being higher than ours, or was it the fact that we are testing on a SandyBridge testbed vs. their setup? It's true that SandyBridge-E has plenty more PCIe lanes available, but in theory this should not matter since even though SandyBridge has only 16 PCIe lanes available (vs. the E variant's 40 lanes), the 910 SSD is only using 8 of them. To clear this up, I fired up the 910 again, this time dialing in to match the QD=64 figure used in their testing. Here's the result:

View Full Size

That looks really close. Lets compare them directly:

Spec PCPer (SandyBridge) SSD Review (SandyBridge-E) Delta
IOPS 228249.80 228750.67 0.2%
MB/s 891.60 936.96 (decimal) 0.2%
Avg IO Response 1.1214 1.1178 0.3%
Max IO Response 1.7414 4.0560 57.0%
% CPU 28.69% 11.60% 59.6% lets break these down. First off, with less than half of one percent difference in IOPS and IO Response Time, it's fair to say the testbeds are identical in ultimate IOPS, and that the number of extra PCIe lanes available is irrelevant. With that put to rest, lets move onto CPU. With more cores enabled (as well as HyperThreading), the LGA2011 CPU uses less of a percentage to accomplish the same task. This is expected, however there is a twist. I disable HyperThreading to prevent any possible added latencies from context switching, which might have contributed to us not seeing that rather long Maximum IO seen in the SandyBridge-E figures. It might have also been caused by the SandyBridge-E system running a pair of SLI GPUs in the same system testing the SSD. There's no way to say for certain - all we know for sure is that we didn't see the same unusually large Maximum IO Response Time.


So there you have it. I wanted the first look to cover and verify most of Intel's stated specs. We verified, and it has absolutely impressed us thus far. There's more to follow on the SSD 910 as we dive into further resting for a more detailed review. Oh, and regarding that endurance spec we left out - I'll get back to you in five years :).

April 27, 2012 | 03:52 PM - Posted by Compton (not verified)

It's true that on a 1155 mainboard/cpu combo you should keep all available PCIe bandwidth for the 910. But over at The SSD Review they were testing it on a X79 with 40 PCIe lanes and not 16.

April 27, 2012 | 11:11 PM - Posted by Allyn Malventano

Re-tested using the same QD=64 and saw same result. Updated the piece with that analysis. Thanks!

April 27, 2012 | 04:11 PM - Posted by Eastside (not verified)

You are comparing your 4k random write speeds to their posted 4k random read speeds. Their read results are actually higher than your posted results, and they did not post their 4k write results.

April 27, 2012 | 11:11 PM - Posted by Allyn Malventano

You're absolutely correct! I've fixed this just now.

April 28, 2012 | 01:02 AM - Posted by Eastside (not verified)

When you are comparing the LUN performance, you are using these as individual volumes?
Or are they in a raid configuration? So with 4 LUN, is that 4RO, or 4 separate volumes being accessed simultaneously?

April 28, 2012 | 11:35 AM - Posted by Allyn Malventano

The ATTO run was using standard Windows RAID-0 for the 4 LUNs combined. The IOMeter run accessed the tested LUNs simultaneously in RAW form. The latter was done to properly evaluate IOPS scaling of the LSI HBA without adding variables caused by the Windows RAID layer.

April 29, 2012 | 04:12 AM - Posted by Paul Alcorn (not verified)

Therein lies the issue with the max latency. The results that we posted on were from a RAID 0 of the four volumes.
Under similar tests to yours, with the same parameters, the results come in at 29310.72 IOPS, 895.74 MB/s (binary), Average response time of 1.116263 and a maximum response time of 2.170227. CPU utilization is 11.39%.
The higher maximum latency reported from RAID 0 is indicative of typical RAID overhead with windows. These were cursory benchmarks, ran before the SSD went into a automated test regimen.
Of note; The maximum latency is the single I/O that requires the longest time to complete. If there is a correlation between a very high maximum latency and an overall higher average latency, that can be indicative of a device/host issue. Even with the RAID result kicking out an appreciably higher maximum latency, that result would have to be in conjunction with higher overall latency to indicate a serious problem.
The SLI GPUs are rarely used during bench sessions, unless we are doing 3d benchmarks. They are on an entirely separate loop, allowing them to be used, or removed easily. During all testing thus far, we have used a 9800gt as the video card.
No worries, the X79 Patsburg (C600)chipset is designed for servers and high end workstations. Plenty of bandwidth there.

April 30, 2012 | 11:11 PM - Posted by Allyn Malventano

Actually, if you were testing a single RAID volume with 4 workers and QD=64, you were actually testing with QD=256, which might have upped the latency. From the Iometer User's Guide:



9.4 # of Outstanding I/Os
The # of Outstanding I/Os control specifies the maximum number of outstanding asynchronous I/O operations per disk the selected worker(s) will attempt to have active at one time. (The actual queue depth seen by the disks may be less if the operations complete very quickly.) The default value is 1.
Note that the value of this control applies to each selected worker and each selected disk.


May 2, 2012 | 02:48 AM - Posted by Paul Alcorn (not verified)

It was with 16 QD for each worker. It is elementary that it adds up to 64. The reason that we stated that the QD was 64 is because 16 X 4 = 64. When listing results typically the results are listed as the overall QD, with the number of workers noted.
If there were an overall issue with the latency, it would show in the average latency measurement, which is actually slightly lower than your results. Did you receive the email with the Iometer results that I sent you?

May 8, 2012 | 02:09 AM - Posted by Paul Alcorn (not verified)

our full review is up, you should give it a glance allyn at