Mainframe Storage

DS8000 Smart Rebuild

By Brian Kraemer posted Fri June 11, 2021 12:57 PM


The IBM DS8000 is an industry leading external storage server with outstanding Reliability, Accessibilty, and Serviceability (RAS) characteristics. While the IBM DS8000 has utilized RAID technology since its initial delivery almost two decades ago, an additional feature know as Smart Rebuild takes the reliability provided by traditional RAID technologies to another level.

RAID Background
RAID (Redundant Array of Independent Disks) is a technology that has been around for many decades. While there are different variations of RAID, the IBM DS8000 has utilized three of these: RAID-5, RAID-6, and RAID-10.

With the initial delivery of IBM DS8000, the default RAID technology was RAID-5. RAID-5 utilizes striping of data across array member drives with distributed parity. This means that parity for stripes is not always on the same member drive. RAID-5 tolerates a single drive failure while maintaining data integrity.

In 4Q2016, IBM DS8000 moved away from RAID-5 to RAID-6 as the default RAID technology. RAID-6 is very much similar to RAID-5, but it uses two parity blocks. By using two parity blocks, RAID-6 tolerates two drive failures. This change in default RAID technology helped to further protect customer data.

RAID-10 is also supported by IBM DS8000. RAID-10, which combines RAID-1 and RAID-0, which stripes data to mirrored pairs. The RAID-1 part of this equation creates mirrored pair drives and the RAID-0 part of this equation performs the striping of data. There is no parity created. RAID-10 can theoretically tolerate a number of failures equal to the number of mirror pairs.

RAID Recovery
Utilizing hot spare pools, the IBM DS8000 exchanges RAID array members when errors are detected or predicted. Data from the RAID array is written to a new drive from the spare pool using parity from the good array members in the case of RAID-6 or RAID-5 or from the mirrored pair in the case of RAID-10. By using RAID technology along with hot spare pools, IBM DS8000 is extremely effective in preserving data.

Smart Rebuild
Taking reliability a step further, the IBM DS8000 implemented Smart Rebuild. If a RAID-6 (or RAID-5) array member is predicted to fail given specific error signatures, but the identified RAID member drive is still accessible, the IBM DS8000 can invoke Smart Rebuild. Rather than utilizing RAID rebuild using parity, Smart Rebuild essentially performs a disk copy between the drive predicted to fail and a hot spare drive. And, like RAID rebuild, Smart Rebuild is completely concurrent with IO activity. No pause in IO is required.

An advantage that Smart Rebuild has over a RAID rebuild is the elimination of any possibility of tertiary failures during parity rebuild.