Power

 View Only
  • 1.  IBMi and NVMe failure

    Posted Mon February 03, 2025 10:52 AM

    Question specific to what happens if an NVMe fail as part of the mirrored configuration for IBMi?  I know the mirrored NVMe disk will take over; however, any real life experience or documentation to recovery of the failed drive?  Do name spaces need to be reconstructed, mirroring restarted?



    ------------------------------
    Michael Miller
    Enterprise Architect
    Mainline Information Systems
    ------------------------------


  • 2.  RE: IBMi and NVMe failure

    Posted Mon February 03, 2025 05:27 PM

    Hello,

    Please refer to the below knowledge base link from IBM i 7.5. It might help to answer your question.  I haven't experienced it recently but from the documentation it looks like namespaces needs to be reconstructed for the newly installed selected NVMe devices , pair new namespaces with the suspended NVMe disk units and then resume the mirror protection.

    Create NVMe namespaces to pair with Active mirror protected NVMe units - IBM Documentation



    ------------------------------
    Rohit Chauhan
    Senior Technical Specialist
    Norway
    ------------------------------



  • 3.  RE: IBMi and NVMe failure

    Posted Tue February 04, 2025 06:25 AM
    Edited by Bartlomiej Grabowski Tue February 04, 2025 06:26 AM

    Michael,

    We experienced such an issue. Maybe not really an NVMe adapter failure but all PCI slots managed by the same PCI Gen bridge failed. We have 4 NVMe PCI adapters running for 2 separate LPARs in mirror configuration. Due to internal HW error there was a second or two outage in the  PCI bridge, it failed and quickly automatically recovered, but all PCI adapters managed by this chip failed for short amount of time. Unfortunately, we had one pair of NVMe adapters in slots managed by the same PCI chip, the second pair was managed by two chips. 

    The first LPAR crashed - IPL was needed, but we did not lose any data, mirror rebuild, nor name space recreation was not necessary. The second LPAR survived, because just one NVMe adapter was affected. 

     



    ------------------------------
    Bartlomiej Grabowski
    IBM Champion - Platinum Redbook Author and Principal System Specialist
    ------------------------------