At the point of the dead canister I recommend going to IBM support regardless. You should not attempt a single node T3 in this state. Manual or not.
Original Message:
Sent: Wed July 03, 2024 04:08 AM
From: Tino Schumann
Subject: V3700 cluster failure(both nodes of system is service state with error)
Hi,
so node2 is still dead ? Also aber battery and node reseat ?
Then you can only try a T3 with only one node. But first of all you should take a look in the latest xml config file (stored on the node filesystem)
There you can see latest working status/time from the system and which node(s) was online at this time.
If latest online node was node2 then you have a problem. Then the auto T3 will fail i think. Then you need a manual T3 (for this you need support)
Do not swap the nodes in other slots !
------------------------------
Tino Schumann
Original Message:
Sent: Wed July 03, 2024 03:53 AM
From: Bilal Mansoor
Subject: V3700 cluster failure(both nodes of system is service state with error)
Dear Fellows,
Greatly thanks for reply ,
Again checked 2nd node after battery reset, only power and warning light on, no response on network or on usb .
bellow is "satask_result.html" from working node, please review and guide, Greatly thankful for your time
======================================
Wed Jul 3 07:09:15 PKT 2024
satask.txt file not found.
System Status
sainfo lsservicenodes
panel_name cluster_id cluster_name node_id node_name relation node_status error_data7804757-1 local Service 578
sainfo lsservicestatus
panel_name 7804757-1cluster_id cluster_name cluster_status Inactivecluster_ip_count 2cluster_port 1cluster_ip cluster_gw cluster_mask cluster_ip_6 cluster_gw_6 cluster_prefix_6 cluster_port 2cluster_ip cluster_gw cluster_mask cluster_ip_6 cluster_gw_6 cluster_prefix_6 node_id node_name node_status Serviceconfig_node Nohardware TB4service_IP_address 192.168.1.122service_gateway 192.168.1.1service_subnet_mask 255.255.255.0service_IP_address_6 service_gateway_6 service_prefix_6 node_code_version 6.4.1.0node_code_build 74.2.1210240000cluster_code_build node_error_count 2error_code 578error_data error_code 734error_data 1 1 0fc_ports 4port_id 1port_status Activeport_speed 8Gbport_WWPN 50050768030425b0SFP_type Short-waveport_id 2port_status Inactiveport_speed N/Aport_WWPN 50050768030825b0SFP_type Short-waveport_id 3port_status Inactiveport_speed N/Aport_WWPN 50050768030c25b0SFP_type N/Aport_id 4port_status Inactiveport_speed N/Aport_WWPN 50050768031025b0SFP_type N/Aethernet_ports 2ethernet_port_id 1port_status Link Onlineport_speed 100Mb/s - FullMAC 5c:f3:fc:f5:3d:3eethernet_port_id 2port_status Not Configuredport_speed MAC 5c:f3:fc:f5:3d:3fproduct_mtm 2072-24Cproduct_serial 7804757time_to_charge 0battery_charging 100dump_name 7804757-1node_WWNN disk_WWNN_suffix panel_WWNN_suffix UPS_serial_number UPS_status enclosure_WWNN_1 50050768030025b0enclosure_WWNN_2 50050768030025b1node_part_identity 11S00AR000YM10BG335020node_FRU_part 00AR004enclosure_identity 11S00Y2441YM12BG32T00JPSU_count 0PSU_id 1PSU_status PSU_id 2PSU_status Battery_count 1Battery_id 1Battery_status activeBattery_id 2Battery_status node_location_copy 1node_product_mtm_copy 2072-24Cnode_product_serial_copy 7804757node_WWNN_1_copy 50050768030025b0node_WWNN_2_copy 50050768030025b1latest_cluster_id c0202025b2next_cluster_id c0204025b2console_IP has_nas_key nofc_io_ports 4fc_io_port_id 1fc_io_port_WWPN 50050768030425b0fc_io_port_switch_WWPN 0000000000000000fc_io_port_state Activefc_io_port_FCF_MAC N/Afc_io_port_vlanid N/Afc_io_port_type FCfc_io_port_type_port_id 1fc_io_port_id 2fc_io_port_WWPN 50050768030825b0fc_io_port_switch_WWPN 0000000000000000fc_io_port_state Inactivefc_io_port_FCF_MAC N/Afc_io_port_vlanid N/Afc_io_port_type FCfc_io_port_type_port_id 2fc_io_port_id 3fc_io_port_WWPN 50050768030c25b0fc_io_port_switch_WWPN 0000000000000000fc_io_port_state Inactivefc_io_port_FCF_MAC N/Afc_io_port_vlanid N/Afc_io_port_type FCfc_io_port_type_port_id 3fc_io_port_id 4fc_io_port_WWPN 50050768031025b0fc_io_port_switch_WWPN 0000000000000000fc_io_port_state Inactivefc_io_port_FCF_MAC N/Afc_io_port_vlanid N/Afc_io_port_type FCfc_io_port_type_port_id 4service_IP_mode staticservice_IP_mode_6 machine_part_number 2072S2Cnode_machine_part_number_copy 2072S2C
sainfo lsservicerecommendation
service_actionFollow troubleshooting procedures to recover cluster.
sainfo lshardware
panel_name 7804757-1node_id node_name node_status Servicehardware TB4actual_different noactual_valid yesmemory_configured 4memory_actual 4memory_valid yescpu_count 1cpu_socket 1cpu_configured 2 core Intel(R) Celeron(R) CPU G530T @ 2.00GHzcpu_actual 2 core Intel(R) Celeron(R) CPU G530T @ 2.00GHzcpu_valid yescpu_socket cpu_configured cpu_actual cpu_valid adapter_count 5adapter_location 0adapter_configured High Speed SAS adapteradapter_actual High Speed SAS adapteradapter_valid yesadapter_location 0adapter_configured Midplane bus adapteradapter_actual Midplane bus adapteradapter_valid yesadapter_location 0adapter_configured 1Gb/s Ethernet adapteradapter_actual 1Gb/s Ethernet adapteradapter_valid yesadapter_location 0adapter_configured 1Gb/s Ethernet adapteradapter_actual 1Gb/s Ethernet adapteradapter_valid yesadapter_location 1adapter_configured Four port 8Gb/s FC adapteradapter_actual Four port 8Gb/s FC adapteradapter_valid yesadapter_location adapter_configured adapter_actual adapter_valid ports_different no
------------------------------
Bilal Mansoor
Original Message:
Sent: Wed July 03, 2024 02:37 AM
From: Tino Schumann
Subject: V3700 cluster failure(both nodes of system is service state with error)
Hi,
only with this pic it's not possible to give you a actionplan.
Current status is node2 is completely offline and node1 is "service" with error 578 (which mean the node went down for any reason)
First you should try to revive node2.
- remove node 2 from the system
- remove the node-battery 1 minute.
- reinstall battery and put the node back in the system.
- check with sainfo lsservicenodes or GUI if the node is back in any status.
The next steps are depending from the outcome of this actions and how long each node was down and in which order they went down.
I recommend to open a supportcase for the next steps. (I know this can be a problem because the system is EOS since 1,5 years.)
Greetings
Tino
------------------------------
Tino Schumann
Original Message:
Sent: Tue July 02, 2024 07:41 AM
From: Bilal Mansoor
Subject: V3700 cluster failure(both nodes of system is service state with error)
Hello,
This is error about V3700, Now not use access(lock access data). I can access service only, Error 578, 734 , 2nd node is not accessible, I see error on node show this picture. Pleas help step for recover or identified for this issue, Thank you very much.
------------------------------
Bilal Mansoor
------------------------------