Primary Storage

 View Only
  • 1.  Re Build Times with SSD Failure

    Posted Tue May 04, 2021 12:41 PM
    Can anyone help with the following

    "What is the estimated DRAID6 array rebuild time following a 15.36TB RI SSD drive failure?"
     
    I assume 2 pieces of info here.
    How long to utilise the spare rebuild areas and recover a 24 drive array to 23 drives?
    And then, after drive replacement, how long to redistribute to 24 drives?

    ------------------------------
    Nicholas Slater
    ------------------------------


  • 2.  RE: Re Build Times with SSD Failure

    Posted Thu May 06, 2021 10:50 AM
    Rebuild time is mostly based on system load, and number of drives.
    I have 12x v5 systems that can sustain around 85kiops.
    Each has one 12x 3.49TiB RI SAS12 drive array, and two 137x 5.46TiB 7200rpm SAS12 drive arrays.

    On a system that averages 12kiops, rebuild of either array took about 13 hours, and copyback about 7H 50M.
    On a system that averages 42kiops, rebuild of either array took about 15 hours, and copyback about 8H 8M.

    Rebuild and Copyback for an array of 24x RI drives on a FS7200 should be roughly twice as fast.
    Rebuild times for 15.36TiB drives should be roughly 4.4x as long.

    So, your rebuild time should be about 31 hours, and your copyback time should be about 18 hours.
    If your system is constantly under very heavy I/O, then I would expect those times to extend.

    ------------------------------
    JoshDaniel Davis
    ------------------------------



  • 3.  RE: Re Build Times with SSD Failure

    Posted Fri May 07, 2021 10:50 AM
    Hi Josh,
    those figures seem about right based on a rebuild time predictor tool I've seen and reflect what the system will do for the 1st failed drive in an array.

    Recommendations on DRAID level to use have historically been DRAID6 (at least before DRAID1 capability was added for some smaller config edge case scenarios), and that of course provides data integrity in case of double drive failure.

    DRAID rebuild speed for the 1st drive failure is balanced with not wanting to impact host IO latency, but in the case of having 2 rebuilds on DRAID6, the rebuild rate accelerates significantly until the 1st drive has been rebuilt as there is now an increased risk of data loss (in case of a further drive failing). Once the 1st failed drive is rebuilt, the rebuild reverts to standard speed.

    Illustrating this with your example (or close), using 128 x 6 TB NL-SAS drives (the maximum number of drives in a single DRAID 6 array), the single drive rebuild speed on a lightly loaded system should be around the 13 hrs you suggested.

    On the same system, in case of a 2nd concurrent rebuild, the 1st drive rebuild could potentially take less than 2 hrs 15 mins (and then the 2nd drive rebuild would be back to 13 hrs as the risk to the customer's data is reduced).

    That's the value of DRAID and that's why we recommend people use DRAID6.

    Best Regards,
    Lee McEvoy

    ------------------------------
    Lee McEvoy
    ------------------------------



  • 4.  RE: Re Build Times with SSD Failure

    Posted Mon May 10, 2021 03:28 PM
    Lee, that's great info and good to know on rebuild priority. 

    And yes, 25 years in data management, there are very few places I would be okay with anything other than RAID6 (2-drive RAID1, 3-drive RAID5, or something for disposable temporary data). 

    Now if only I could get distributed sparing from MDADM.  :)

    ------------------------------
    JoshDaniel Davis
    ------------------------------