The problem came from the alias of the V7000. We made a mistake, with adding the physical WWPN too. After we removed it, and only the virtual WWPNs left, every NODE went back, to Online & Active.
Original Message:
Sent: Mon February 12, 2024 10:08 PM
From: GLEN ROUTLEY
Subject: V7000 - Host Degraded, Port Active
Hello Krisztián.
Firstly, what is the Model of your V7000 and what code level is it running?
With the redacted information, there really is not enough information to help you.
while the lshost output looks as though things are all good......
WWPN 1000001*********node_logged_in_count 2state activeWWPN 1000001*********node_logged_in_count 2state active
where each Host 'NODE2' HBA port is logged into each node of the V7000.
however there is a need for the actual Host WWPNs (according to the actual Host )
Your Zoning you provide gives 'alias' names, so there might be something incorrect with the alias definitions.
In older code levels there has been a bug where a host may display degraded incorrectly.
but without all the pieces I cannot complete your puzzle.
------------------------------
GLEN ROUTLEY
Original Message:
Sent: Thu February 08, 2024 04:58 AM
From: Krisztián Révész
Subject: V7000 - Host Degraded, Port Active
Hi Everybody!
We have encountered an error with our Storwize V7000.
Our setup has 6 nodes, with two Brocade SAN SW-es (DS_6505B).
(Earlier we had 4 nodes connected directly to the V7000 via 8 FC cables, so 2 nodes and the SAN SW-es are just installed)
5 nodes are working properly, but NODE-2 is shown Degraded at the hosts page, meanwhile in the Properties >> Port Definitons page everything is Active.
Here is the CLI example about the problem:
IBM_Storwize:STORAGE-5:superuser>lshostid name port_count iogrp_count status site_id site_name host_cluster_id host_cluster_name protocol owner_id owner_name0 NODE-1 2 4 online 0 CLUSTER1 scsi1 NODE-2 2 4 degraded 0 CLUSTER1 scsi2 SERVER-9 2 4 online 1 CLUSTER2 scsi3 SERVER-10 2 4 online 1 CLUSTER2 scsi4 NODE-3 2 4 online 2 CLUSTER3 scsi5 NODE-4 2 4 online 2 CLUSTER3 scsi
IBM_Storwize:STORAGE-5:superuser>lshost 1id 1name NODE-2port_count 2type genericmask 1111111111111111111111111111111111111111111111111111111111111111iogrp_count 4status degradedsite_idsite_namehost_cluster_id 0host_cluster_name CLUSTER1protocol scsistatus_policy redundantstatus_site allWWPN 1000001*********node_logged_in_count 2state activeWWPN 1000001*********node_logged_in_count 2state activeowner_idowner_name
The SANs are configured with the same commands.
SAN-1 has ST5-CAN1/1, ST5-CAN1/3, ST5-CAN2/1, ST5-CAN2/3 connected.
SAN-2 has ST5-CAN1/2, ST5-CAN1/4, ST5-CAN2/2, ST5-CAN2/4 connected.
TVT_STORAGE-5_PORT3 for example includes CAN1/3 & CAN2/3 WWPNs (Virtualized and physical), and the zones are for this node are the following:
zone: Z-TVT-NODE-2_HBA1_PORT0-TVT_STORAGE-5_PORT3 TVT_NODE-2_HBA1_PORT0; TVT_STORAGE-5_PORT3 zone: Z-TVT-NODE-2_HBA2_PORT0-TVT_STORAGE-5_PORT4 TVT_NODE-2_HBA2_PORT0; TVT_STORAGE-5_PORT4
fcping is working properly I think (TVT_NODE-2_HBA1_PORT0 is connected to SAN-1 & TVT_NODE-2_HBA2_PORT0 is connected to SAN-2):
SAN-1:admin> fcping 10:00:00:1*:**:**:**:** (TVT_NODE-2_HBA1_PORT0)Destination: 10:00:00:1*:**:**:**:**Pinging 10:00:00:1*:**:**:**:** [0x010300] with 12 bytes of data:received reply from 10:00:00:1*:**:**:**:**: 12 bytes time:722 usecreceived reply from 10:00:00:1*:**:**:**:**: 12 bytes time:695 usecreceived reply from 10:00:00:1*:**:**:**:**: 12 bytes time:628 usecreceived reply from 10:00:00:1*:**:**:**:**: 12 bytes time:720 usecreceived reply from 10:00:00:1*:**:**:**:**: 12 bytes time:660 usec5 frames sent, 5 frames received, 0 frames rejected, 0 frames timeoutRound-trip min/avg/max = 628/685/722 usecSAN-1:admin> fcping 10:00:00:1*:**:**:**:** (TVT_NODE-2_HBA2_PORT0)fcping: Error destination wwn invalid
SAN-2:admin> fcping 10:00:00:1*:**:**:**:** (TVT_NODE-2_HBA1_PORT0)fcping: Error destination wwn invalidSAN-2:admin> fcping 10:00:00:1*:**:**:**:** (TVT_NODE-2_HBA2_PORT0)Destination: 10:00:00:1*:**:**:**:**Pinging 10:00:00:1*:**:**:**:** [0x010300] with 12 bytes of data:received reply from 10:00:00:1*:**:**:**:**: 12 bytes time:660 usecreceived reply from 10:00:00:1*:**:**:**:**: 12 bytes time:697 usecreceived reply from 10:00:00:1*:**:**:**:**: 12 bytes time:625 usecreceived reply from 10:00:00:1*:**:**:**:**: 12 bytes time:717 usecreceived reply from 10:00:00:1*:**:**:**:**: 12 bytes time:627 usec5 frames sent, 5 frames received, 0 frames rejected, 0 frames timeoutRound-trip min/avg/max = 625/665/717 usec
NODE-2 is restarted and updated. What shall we check next? Would it be a Storage, SAN, or NODE missconfiguration?
Thx for the advices!
------------------------------
Krisztián Révész
------------------------------