Global Storage

Global Storage Forum

Connect, collaborate, and stay informed with insights from across Storage

 View Only
  • 1.  DRAID6 redundancy question

    Posted 3 days ago

    Hi we have a FS5020 that had 4x 6TB drive failures within a few hours of each other, surprisingly the mdisk didn't go offline and i dont understand why, in my mind that mdisk should be a total loss because a 6TB drive would take at least 15 hours to rebuild so the stripe would have been incomplete and draid6 can only tolerate 2 failures before a 3rd one would result in parity failure.

    The mdisk has a 60 members, 2 rebuild areas, redundancy of 2 and a stripe width of 12.

    A colleague thought that maybe each 12 width stripe was tolerant of 2 failures in its own right? I don't think this is the case.

    The drives have been replaced and copyback's are in progress. 

    If a drive is starting to pre-fail and firmware picks this up before it fails does it build to the spare then fail the drive?

    Any comments gratefully received.



    ------------------------------
    stuart wade
    ------------------------------


  • 2.  RE: DRAID6 redundancy question

    Posted 3 days ago

    Hi Stuart, 

    first of all you must upgrade your drive firmware levels to the latest code level, you can find version levels for your drives in this list

    If your mdisk is set up with DRAID6 and has 2 rebuild areas (spares), then the first two drive failures are covered by those spares automatically.

    The interesting part is how the stripes are distributed. Each stripe can only handle up to 2 failed drives. But in DRAID, not every stripe uses the same drives. So, if the 4 failed drives are spread out and no single stripe loses more than 2 of its members, the system can actually keep running without data loss. It's definitely risky because it depends on where the failures land. It's kind of a "luck of the draw" situation.



    ------------------------------
    Nezih Boyacioglu
    ------------------------------



  • 3.  RE: DRAID6 redundancy question

    Posted 3 days ago

    Hi Nezih, That's the perfect answer and explains why we had 4 drives fail and the mdisk stayed online, I didn't know that about the stripes.

    The drives that failed where in different enclosures so are on different stripes.

    One drive has finished the copy back and its now doing an array mdisk rebuild so much better state now.

    Thank you, Stuart. P.S I'm off to buy a lottery ticket!



    ------------------------------
    stuart wade
    ------------------------------