Primary Storage

 View Only
  • 1.  V3700 cluster failure(both nodes of system is service state with error)

    Posted 2 days ago

    Hello,

        This is error about V3700, Now not use access(lock access data). I can access service only, Error 578, 734 , 2nd node is not accessible, I see error on node show this picture. Pleas help step for recover or identified for this issue, Thank you very much.





    ------------------------------
    Bilal Mansoor
    ------------------------------


  • 2.  RE: V3700 cluster failure(both nodes of system is service state with error)

    Posted 2 days ago

    Hi,

     Please the following this detail :

    User Response

    1. If possible, this noncritical node error should be serviced using the management GUI and running the recommended actions for the service error code.
    2. Follow the procedure for getting node canister and clustered-system information and determine the state of the partner node canister in the enclosure. Fix any errors reported on the partner node canister.
    3. Use the remove and replace procedures to replace the enclosure.

    Possible Cause-FRUs or other cause:

    • Node canister
    • Enclosure midplane


    ------------------------------
    Best Regards,

    Apidesh Dulma
    Managing Director/Service Manager
    A Vision IT Systems(Thailand) Co.Ltd.
    M:(+66)81-822-2904, (+66)61-390-2710
    E:apidesh@avisionitsystem.com
    www.avisionitsystem.com
    ------------------------------



  • 3.  RE: V3700 cluster failure(both nodes of system is service state with error)

    Posted 2 days ago

    Hi,

    only with this pic it's not possible to give you a actionplan.
    Current status is node2 is completely offline and node1 is "service" with error 578 (which mean the node went down for any reason)

    First you should try to revive node2. 
    - remove node 2 from the system
    - remove the node-battery 1 minute.
    - reinstall battery and put the node back in the system.
    - check with sainfo lsservicenodes or GUI if the node is back in any status.

    The next steps are depending from the outcome of this actions and how long each node was down and in which order they went down.
    I recommend to open a supportcase for the next steps. (I know this can be a problem because the system is EOS since 1,5 years.)

    Greetings
    Tino 



    ------------------------------
    Tino Schumann
    ------------------------------



  • 4.  RE: V3700 cluster failure(both nodes of system is service state with error)

    Posted 2 days ago

    Dear Fellows,

    Greatly thanks for reply ,

    Again checked 2nd node after battery reset, only power and warning light on, no response on network or on usb .

    bellow is "satask_result.html" from working node, please review and guide, Greatly thankful for your time

    ======================================

    Wed Jul  3 07:09:15 PKT 2024
    satask.txt file not found.

    System Status

    sainfo lsservicenodes

    panel_name cluster_id cluster_name node_id node_name relation node_status error_data
    7804757-1                                            local    Service     578

    sainfo lsservicestatus

    panel_name 7804757-1
    cluster_id 
    cluster_name 
    cluster_status Inactive
    cluster_ip_count 2
    cluster_port 1
    cluster_ip 
    cluster_gw 
    cluster_mask 
    cluster_ip_6 
    cluster_gw_6 
    cluster_prefix_6 
    cluster_port 2
    cluster_ip 
    cluster_gw 
    cluster_mask 
    cluster_ip_6 
    cluster_gw_6 
    cluster_prefix_6 
    node_id 
    node_name 
    node_status Service
    config_node No
    hardware TB4
    service_IP_address 192.168.1.122
    service_gateway 192.168.1.1
    service_subnet_mask 255.255.255.0
    service_IP_address_6 
    service_gateway_6 
    service_prefix_6 
    node_code_version 6.4.1.0
    node_code_build 74.2.1210240000
    cluster_code_build 
    node_error_count 2
    error_code 578
    error_data 
    error_code 734
    error_data 1 1 0
    fc_ports 4
    port_id 1
    port_status Active
    port_speed 8Gb
    port_WWPN 50050768030425b0
    SFP_type Short-wave
    port_id 2
    port_status Inactive
    port_speed N/A
    port_WWPN 50050768030825b0
    SFP_type Short-wave
    port_id 3
    port_status Inactive
    port_speed N/A
    port_WWPN 50050768030c25b0
    SFP_type N/A
    port_id 4
    port_status Inactive
    port_speed N/A
    port_WWPN 50050768031025b0
    SFP_type N/A
    ethernet_ports 2
    ethernet_port_id 1
    port_status Link Online
    port_speed 100Mb/s - Full
    MAC 5c:f3:fc:f5:3d:3e
    ethernet_port_id 2
    port_status Not Configured
    port_speed 
    MAC 5c:f3:fc:f5:3d:3f
    product_mtm 2072-24C
    product_serial 7804757
    time_to_charge 0
    battery_charging 100
    dump_name 7804757-1
    node_WWNN 
    disk_WWNN_suffix 
    panel_WWNN_suffix 
    UPS_serial_number 
    UPS_status 
    enclosure_WWNN_1 50050768030025b0
    enclosure_WWNN_2 50050768030025b1
    node_part_identity 11S00AR000YM10BG335020
    node_FRU_part 00AR004
    enclosure_identity 11S00Y2441YM12BG32T00J
    PSU_count 0
    PSU_id 1
    PSU_status 
    PSU_id 2
    PSU_status 
    Battery_count 1
    Battery_id 1
    Battery_status active
    Battery_id 2
    Battery_status 
    node_location_copy 1
    node_product_mtm_copy 2072-24C
    node_product_serial_copy 7804757
    node_WWNN_1_copy 50050768030025b0
    node_WWNN_2_copy 50050768030025b1
    latest_cluster_id c0202025b2
    next_cluster_id c0204025b2
    console_IP 
    has_nas_key no
    fc_io_ports 4
    fc_io_port_id 1
    fc_io_port_WWPN 50050768030425b0
    fc_io_port_switch_WWPN 0000000000000000
    fc_io_port_state Active
    fc_io_port_FCF_MAC N/A
    fc_io_port_vlanid N/A
    fc_io_port_type FC
    fc_io_port_type_port_id 1
    fc_io_port_id 2
    fc_io_port_WWPN 50050768030825b0
    fc_io_port_switch_WWPN 0000000000000000
    fc_io_port_state Inactive
    fc_io_port_FCF_MAC N/A
    fc_io_port_vlanid N/A
    fc_io_port_type FC
    fc_io_port_type_port_id 2
    fc_io_port_id 3
    fc_io_port_WWPN 50050768030c25b0
    fc_io_port_switch_WWPN 0000000000000000
    fc_io_port_state Inactive
    fc_io_port_FCF_MAC N/A
    fc_io_port_vlanid N/A
    fc_io_port_type FC
    fc_io_port_type_port_id 3
    fc_io_port_id 4
    fc_io_port_WWPN 50050768031025b0
    fc_io_port_switch_WWPN 0000000000000000
    fc_io_port_state Inactive
    fc_io_port_FCF_MAC N/A
    fc_io_port_vlanid N/A
    fc_io_port_type FC
    fc_io_port_type_port_id 4
    service_IP_mode static
    service_IP_mode_6 
    machine_part_number 2072S2C
    node_machine_part_number_copy 2072S2C

    sainfo lsservicerecommendation

    service_action
    Follow troubleshooting procedures to recover cluster.

    sainfo lshardware

    panel_name 7804757-1
    node_id 
    node_name 
    node_status Service
    hardware TB4
    actual_different no
    actual_valid yes
    memory_configured 4
    memory_actual 4
    memory_valid yes
    cpu_count 1
    cpu_socket 1
    cpu_configured 2 core Intel(R) Celeron(R) CPU G530T @ 2.00GHz
    cpu_actual 2 core Intel(R) Celeron(R) CPU G530T @ 2.00GHz
    cpu_valid yes
    cpu_socket 
    cpu_configured 
    cpu_actual 
    cpu_valid 
    adapter_count 5
    adapter_location 0
    adapter_configured High Speed SAS adapter
    adapter_actual High Speed SAS adapter
    adapter_valid yes
    adapter_location 0
    adapter_configured Midplane bus adapter
    adapter_actual Midplane bus adapter
    adapter_valid yes
    adapter_location 0
    adapter_configured 1Gb/s Ethernet adapter
    adapter_actual 1Gb/s Ethernet adapter
    adapter_valid yes
    adapter_location 0
    adapter_configured 1Gb/s Ethernet adapter
    adapter_actual 1Gb/s Ethernet adapter
    adapter_valid yes
    adapter_location 1
    adapter_configured Four port 8Gb/s FC adapter
    adapter_actual Four port 8Gb/s FC adapter
    adapter_valid yes
    adapter_location 
    adapter_configured 
    adapter_actual 
    adapter_valid 
    ports_different no



    ------------------------------
    Bilal Mansoor
    ------------------------------



  • 5.  RE: V3700 cluster failure(both nodes of system is service state with error)

    Posted 2 days ago

    Hi,

    so node2 is still dead ?  Also aber battery and node reseat ?
    Then you can only try a T3 with only one node. But first of all you should take a look in the latest xml config file (stored on the node filesystem)
    There you can see latest working status/time from the system and which node(s) was online at this time.

    If latest online node was node2 then you have a problem.  Then the auto T3 will fail i think. Then you need a manual T3 (for this you need support)
    Do not swap the nodes in other slots !



    ------------------------------
    Tino Schumann
    ------------------------------



  • 6.  RE: V3700 cluster failure(both nodes of system is service state with error)

    User Group Leader
    Posted 23 hours ago

    At the point of the dead canister I recommend going to IBM support regardless.  You should not attempt a single node T3 in this state.  Manual or not.



    ------------------------------
    Evelyn Perez
    IBM Senior Technical Staff Member
    IBM Storage Virtualize Software Architect for SVC and FlashSystem
    ------------------------------



  • 7.  RE: V3700 cluster failure(both nodes of system is service state with error)

    User Group Leader
    Posted 2 days ago

    You might be better off recovering from tape with and migrating to a system that is still in support, however, here's some help:

    Node error 578 - This error occurs when the node has lost it's cluster state and the rest of cluster cannot recover it.  This can occur if:

    • The node kernel panics and there is no partner node
    • The node loses power and the battery is unable to hold up the system (it would be useful to know if you had any hardware errors flagged or warnings about near end of life ahead of whatever happened)
    • Any failure where the Software on the node is unable to destage it's critical cluster state into the boot device and recover it (such as a dead drive or similar).

    Node error 734 is stating that the inter canister link is unable to come up.  This is the PCIe connection that runs in the midplane and connects both nodes together.  If node 2 is dead it is likely not answering the requests resulting in this error.

    From your output below you also only have 1 Fibre Channel port active on that node. 

    To recover from the 578 you will need to revive the partner node (which we cannot help with as it is non responsive and likely need hardware replacement, at a minimum a reseat) an hope it has the cluster state (unlikely), at which point you will need to run a T3 or T4 recovery procedure in order to recover most of your data - please note anything in the write cache at the time of failure is lost because that is part of the cluster state that was lost.  See the online documentation for T3 instructions, although generally it is advised to recover the offline node before attempting this, as a single node recovery will require manual changes to the recovery, which would require IBM support assistance.



    ------------------------------
    Evelyn Perez
    IBM Senior Technical Staff Member
    IBM Storage Virtualize Software Architect for SVC and FlashSystem
    ------------------------------