Global Storage

 View Only
  • 1.  IBM Storwize V3700 - Node Status Candidate

    Posted 19 days ago
    Edited by MHAND LA 14 days ago

    Hi!

    The IBM Storwize V3700 storage bay is currently non-operational. The issue began after the storage bay was powered down to replace a UPS (uninterruptible power supply). Following this procedure, the disk array attached to the Windows server via an SAS cable is no longer accessible. Additionally, access to the storage bay's management GUI interface is unavailable. The only remaining access method is through the Service Assistant.

    On the Service Assistant home page, both nodes are displayed in a "candidate" state and are no longer active. As a result, our production environment is completely halted, as all our databases are hosted on this storage system. Immediate assistance is required to restore functionality and resume operations.

    Current Status:

        Storage bay is non-operational.

        SAS-connected disks are not recognized by the Windows server.

        Management GUI is inaccessible.

        Both nodes are in a "candidate" state and inactive via the Service Assistant.

        Production is halted due to the unavailability of critical databases.

    Urgency: High – Immediate resolution is required to minimize downtime and restore production services.

    Do you have any suggestions? Thank you all!



  • 2.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 19 days ago

    Hi!

    See these procedures if fix:

    • Power Cycle the Nodes:

      • Ensure both nodes are powered off completely.
      • Power on the nodes one at a time, allowing each node to fully initialize before powering on the next.
    • Check Connections:

      • Verify that all SAS cables and power connections are securely connected.
      • Ensure there are no visible damages to the cables or ports.
    • Access Service Assistant:

      • Since the management GUI is inaccessible, use the Service Assistant to check the status of the nodes.
      • If both nodes are in a "candidate" state, it indicates that they are not part of an active cluster.
    • Recreate the Cluster:

      • Before proceeding with any recovery operations, ensure that you have recent backups of your critical data. The recovery process may involve steps that could potentially impact data availability.
      • Follow the system recovery procedure on the IBM website here.
      • If the nodes are not automatically joining the cluster, you may need to manually recreate the cluster.
      • Use the Service Assistant to access the command line interface (CLI) and run the following commands: shell sainfo lsservicenodes satask startservice -force
      • This command attempts to start the service on the nodes and force them to join the cluster.


    ------------------------------
    Adalberto Barbosa
    Senior Systems Engineer
    Blue Chip Portugal
    Sintra
    +351214 220 370
    ------------------------------



  • 3.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 19 days ago
    Edited by MHAND LA 14 days ago

    Hi Adalberto,


    Thank you very much for your response.

    How can I directly access the data stored on the SAN to back it up, given that I no longer have access to the volume via the SAS cable? Is there a method using shell commands directly on the SAN to save the data without going through the SAS cable, before proceeding with any recovery process?
    I greatly appreciate any assistance that can help us, as we are in a critical situation-our entire production is currently at a standstill.

    Thank you.

    Here is some information extracted from the SAN via the service assistant interface:


    ------------

    sainfo lsservicenodes
    panel_name cluster_id cluster_name node_id node_name relation node_status error_data
    78B5456-1                                            local    Candidate
    78B5456-2                                            partner  Candidate




  • 4.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 18 days ago
    Edited by Adalberto Barbosa 17 days ago

    Hi MHAND,

    I don't know about pen repairs.
    The 2 nodes are in candidate mode, which means there is no configuration on them.

    Are waiting to be added to a new cluster ( satask mkcluster -clusterip 192.168.1.2 -gw 192.168.1.1 -mask 255.255.255.0)

    The only way that I think you might have luck recovering the configuration is to try in service mode  "Recover System" and be able to access the old configuration through the disks.


    Best Regards

    Adalberto Barbosa



    ------------------------------
    Adalberto Barbosa
    Senior Systems Engineer
    Blue Chip Portugal
    ------------------------------



  • 5.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 18 days ago

    Hi Adalberto,

    Thank you again for your response.

    If I reconfigure the cluster again, will the volumes previously created before the SAN crash be preserved or not? Do I just need to reconfigure them or reattach them without any data loss?

    I completely agree with you that the recovery procedure is the only option, but I don't want to take risks with the critical data stored on the disks. I'm looking for a way to first back up the application data and system configuration, and then proceed with the recovery process without the fear of data loss.

    If it's not possible to back up the data through the service assistant via SSH, I was thinking of cloning or creating an image for each disk before starting the recovery procedure. What do you think?

    Thank you!



    ------------------------------
    MHAND LA
    ------------------------------



  • 6.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 17 days ago

    Hi Mhand,

     

    I think what you are thinking is not possible.

    I'm not even going to ask how the 2 nodes became candidates because if you had a single node failure you could make a smooth recovery.

    In this case, as I told you, these nodes do not know what the previously existing configuration is.

    I only see two solutions: either open a ticket with IBM, which will be very expensive, or try to do the recovery assuming that the configuration information is on the disks.

     

    Please see this document below:

     

    https://www.ibm.com/docs/en/v3700/7.8.1?topic=procedure-running-system-recovery-using-service-assistant

     

     

    Best Regards and good luck with data recovery

     

     

    Adalberto Barbosa

     






  • 7.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 17 days ago

    Thank you very much, Adalberto.



    ------------------------------
    MHAND LA
    ------------------------------



  • 8.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 6 days ago

    Just to know, did the recovery procedure work ?



    ------------------------------
    Pino Mariotto
    ------------------------------



  • 9.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 6 hours ago

    Hi Pino!

    The recovery procedure is not working, and support is not responding.
    Additionally, they canceled the support ticket because the product is at the end of its life and no longer under warranty.

     I am no longer handling this case.

    Thank you.



    ------------------------------
    MHAND LA
    ------------------------------



  • 10.  RE: IBM Storwize V3700 - Node Status Candidate

    Posted 18 days ago


    Hi  Adalberto;

    Thank you very much for your response.

    How can I directly access the data stored on the SAN to back it up, given that I no longer have access to the volume via the SAS cable? Is there a method using shell commands directly on the SAN to save the data without going through the SAS cable, before proceeding with any recovery process?

    Can I clone or take images of the disks attached to the SAN or the existing volumes using Linux commands? For example, over the network:


    dd if=/dev/sdX | ssh user@remote_server 'dd of=image_disk.img'


    Or on a USB drive mounted on the SAN:


    dd if=/dev/sdX of=/mnt/usb/image_disk.img

    I greatly appreciate any assistance that can help us, as we are in a critical situation-our entire production is currently at a standstill.

    Thank you.



    ------------------------------
    MHAND LA
    ------------------------------