IBM Developing 120 Petabyte Water Cooled Storage Array

Subject: Storage | August 26, 2011 - 01:04 PM |
Tagged: storage, Hard Drive, IBM, array

IBM knows how to go big or go home, and their Almaden, California research lab’s current storage project exemplifies that quite nicely. With a data repository that dwarfs anything we have today, IBM is designing a 120 Petabyte storage container. Comprised of 200,000 hard drives, the new storage device is expected to house approximately 1 trillion files or 24 billion 5MB MP3 files. To put that in perspective, Apple has sold 10 billion songs as of February 24, 2010; therefore, you could store every song sold since the Itunes Store’s inception twice and still have room for more!

View Full Size

More specifically, the Almaden engineers have designed new hardware and software techniques to combine all 200,000 hard drives into horizontal drawers that are then all placed into rack mounts. In order to properly cool the drives, IBM had to make the drawers “significantly wider than usual” to cram as many disks as possible into a vertical rack in addition to cooling the disks with circulating water. On the software side of things, IBM has refined their disk parity and mirroring algorithms such that a computer can continue working at near-full speed in the event a drive fails. If a single disk fails, the system begins to pull data from other drives that held copies of the data to write to the replacement disk, allowing the supercomputer to keep processing data. The algorithms control the speed of data rebuilding, and are able to adapt in the event multiple drives begin failing.

In addition to physically spreading data across the drives, IBM is also using a new file system to keep track of all the files across the array. Known as the General Parallel File System (GPFS), it stripes files across multiple disks so that many parts of a files can be written to and read from simultaneously, resulting in massive speed increasing when reading. In addition, the file system uses a new method of indexing that enables it to keep track of billions of files without needing to scan through every one. GPFS has already blown past the previous indexing record of one billion files in three hours with an impressive indexing of 10 billion files in 43 minutes.

The director of storage research for IBM, Bruce Hillsberg stated to Technology Review that the results of their algorithms enables a storage system that should not lose any data for a million years without compromising performance. Hillsberg further indicated that while this 120 Petabyte storage array was on the “lunatic fringe” today, storage is becoming more and more important for cloud computing, and just keeping track of the file names, type, and attributes will use approximately 2 Terabytes of storage.

The array is currently being built for a yet-to-be-announced client, and will likely be used for High Performance Computing (HPC) projects to store massive amounts of modeling and simulation data. Project that could benefit from increased storage include global weather patterns, seismic graphing, Lard Hadron Collider (LHC), and molecular data simulations

Storage research has an amazing pacing, and seems to constantly advance despite pesky details like heat, fault tolerance, aerial density walls, and storage mediums. While this 120 Petabyte array comprised of 200,000 hard drives is out of reach for just about everyone without federal funding or a Fortune 500 company's expense account, the technology itself is definitely interesting and will trickle down advancements to the consumer drives.

Image Copyright comedy_nose via Flickr Creative Commons


August 26, 2011 | 02:03 PM - Posted by pdjblum

Way cool. I guess they are using hard drives rather than solid state drives. Is it that cost is more important than the potential performance increase, or some other factor?

August 26, 2011 | 02:33 PM - Posted by Mindscrew

Well, SSDs are about 20x more expensive per GB than HDDs, so I think cost is the deciding factor. Try pitching a 2000% price increase for a 500% performance increase to a board of people that aren't hardware enthusiasts.

SSD tech is still evolving rapidly, so there are still quirks and compatibility to worry about -- something to keep in mind for an array that will probably be running for at least 5-6 years. Also, you'll look stupid when you can get the same SSD drive next year for half the price.

August 26, 2011 | 02:59 PM - Posted by pdjblum

Cheap bastards. Might save some on energy, both direct and from reduced heat generated.

August 28, 2011 | 04:10 PM - Posted by Mark (not verified)

..."Lard Hadron Collider (LHC),"

Typo: should be "Large"

August 29, 2011 | 03:11 PM - Posted by JR (not verified)

I wonder if we will ever find out who the client is.

August 29, 2011 | 04:24 PM - Posted by GregD (not verified)

the photo is of an HP disk array, which is pretty funny for an IBM story...

August 31, 2011 | 06:22 PM - Posted by berserker29 (not verified)

It sounds like IBM may have solved the write penalty issues RAID 5 and 6 have - kind of ironic that a solution would be a by-product of an unrelated project.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <blockquote><p><br>
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options

By submitting this form, you accept the Mollom privacy policy.