Global Storage

Global Storage Forum

Connect, collaborate, and stay informed with insights from across Storage

View Only

Back to discussions

Expand all | Collapse all

DRAID6 redundancy question

1. DRAID6 redundancy question

Like
stuart wade
Posted Wed June 18, 2025 05:04 AM

Reply
Hi we have a FS5020 that had 4x 6TB drive failures within a few hours of each other, surprisingly the mdisk didn't go offline and i dont understand why, in my mind that mdisk should be a total loss because a 6TB drive would take at least 15 hours to rebuild so the stripe would have been incomplete and draid6 can only tolerate 2 failures before a 3rd one would result in parity failure.

The mdisk has a 60 members, 2 rebuild areas, redundancy of 2 and a stripe width of 12.

A colleague thought that maybe each 12 width stripe was tolerant of 2 failures in its own right? I don't think this is the case.

The drives have been replaced and copyback's are in progress.

If a drive is starting to pre-fail and firmware picks this up before it fails does it build to the spare then fail the drive?

Any comments gratefully received.

------------------------------
stuart wade
------------------------------
2. RE: DRAID6 redundancy question

Like
Nezih Boyacioglu

IBM Champion
Posted Wed June 18, 2025 01:09 PM

Reply
Hi Stuart,

first of all you must upgrade your drive firmware levels to the latest code level, you can find version levels for your drives in this list.

If your mdisk is set up with DRAID6 and has 2 rebuild areas (spares), then the first two drive failures are covered by those spares automatically.

The interesting part is how the stripes are distributed. Each stripe can only handle up to 2 failed drives. But in DRAID, not every stripe uses the same drives. So, if the 4 failed drives are spread out and no single stripe loses more than 2 of its members, the system can actually keep running without data loss. It's definitely risky because it depends on where the failures land. It's kind of a "luck of the draw" situation.

------------------------------
Nezih Boyacioglu
------------------------------

Original Message
3. RE: DRAID6 redundancy question

Like
stuart wade
Posted Wed June 18, 2025 03:18 PM

Reply
Hi Nezih, That's the perfect answer and explains why we had 4 drives fail and the mdisk stayed online, I didn't know that about the stripes.

The drives that failed where in different enclosures so are on different stripes.

One drive has finished the copy back and its now doing an array mdisk rebuild so much better state now.

Thank you, Stuart. P.S I'm off to buy a lottery ticket!

------------------------------
stuart wade
------------------------------

Original Message

Global Storage

Global Storage Forum

DRAID6 redundancy question

stuart wadeWed June 18, 2025 05:04 AM

Nezih BoyaciogluWed June 18, 2025 01:09 PM

stuart wadeWed June 18, 2025 03:18 PM

1. DRAID6 redundancy question

2. RE: DRAID6 redundancy question

3. RE: DRAID6 redundancy question

Additional
Resources

Office

Quick Links

Global Storage

Global Storage Forum

DRAID6 redundancy question

stuart wadeWed June 18, 2025 05:04 AM

Nezih BoyaciogluWed June 18, 2025 01:09 PM

stuart wadeWed June 18, 2025 03:18 PM

1. DRAID6 redundancy question

2. RE: DRAID6 redundancy question

3. RE: DRAID6 redundancy question

Related Content

IBM Champion Spotlight: Nezih Boyacioglu

Storwize V7000 With a Ghost Drive and mdisk offline

Storwize How to refresh Array/drive info ..command equivalent to "detectmdisk"

Firmware fcm deadline

option to increase flash copy timeout from 10 minutes

Additional Resources

Office

Quick Links

Additional
Resources