Automation with Power

Power Business Continuity and Automation

Connect, learn, and share your experiences using the business continuity and automation technologies and practices designed to ensure uninterrupted operations and rapid recovery for workloads running on IBM Power systems.

#Power
#TechXchangeConferenceLab

View Only

Back to discussions

Expand all | Collapse all

One of PowerHA nodes failed to start after storage migration.

1. One of PowerHA nodes failed to start after storage migration.

Like
SHINGO NAGAI
Posted Wed January 31, 2024 11:19 AM

Reply
We have trouble on starting up one of PowerHA cluster nodes after migrating a storage to new one. When migrating a storage, all the disks were copied to those on a new storage except for a repository disk. The old repository disk were replaced with new one from smit menu.

The cluster has 2 nodes and they can have node#2 run alone, but node#1 cannot start up. I have seen the error message below, but I'm sure rhosts is correct because it is the same one as that on node#2.

Error Message: "cl_rsh: node2 cannot be resolved to a valid CAA node name. Check the contents of /etc/cluster/rhosts."

Probably, the problem is "cthags" service on node1 has stayed at "inoperative" state and the following command of starting CAA failed on the node.

# clmgr online node node1 START_CAA=yes

Does anyone come up with an idea about identifying the cause of this problem ? Does it look there is something wrong with CAA ? In this case, do we need to rebuild CAA and PowerHA cluster ?

Regards,

------------------------------
SHINGO NAGAI
------------------------------

#PowerHAforAIX
2. RE: One of PowerHA nodes failed to start after storage migration.

Like
Mostafa Mahmoud
Posted Wed February 07, 2024 05:36 PM
Edited by Mostafa Mahmoud Wed February 07, 2024 05:36 PM

Reply
Hi Shingo,

That indeed looks to be some issue with the CAA layer. It depends on what hostname the CAA cluster used initially in conjunction with the node's AIX hostname, and the content of the /etc/hosts file.

The best thing to do for this issue to be resolved is to open a case with IBM support. Such issues need full assessment to get to the culprit.
------------------------------
Regards,
Mostafa Mahmoud
AIX / PowerHA / CAA / VMRM / RSCT Development Support Engineer
------------------------------

Original Message
3. RE: One of PowerHA nodes failed to start after storage migration.

Like
SHINGO NAGAI
Posted Thu February 08, 2024 10:50 AM

Reply
Hi Mostafa,

Thank you for your advice. Actually, this PowerHA version is 7.1.3 SP1 which is out of support. For some reasons, we cannot upgrade it and need to solve this issue with this version.

In case there is something wrong with CAA, I am thinking that CAA recreation would be a solution. Do you have any thoughts on this ?

Specifically, do the step3 (CAA repository disk scrub) in the link below, then, synchronize cluster configuration from node2 to recreate CAA cluster.

https://www.ibm.com/support/pages/remove-powerha-systemmirror-cluster-configuration-and-rebuild-it-again

------------------------------
SHINGO NAGAI
------------------------------

Original Message
4. RE: One of PowerHA nodes failed to start after storage migration.

Like
Mostafa Mahmoud
Posted Sat February 10, 2024 11:27 AM

Reply
Hi Shingo,

Yes, those steps should be helpful to rebuild the CAA cluster from scratch. Give it a try and post the outcome.

You also may consider upgrading PowerHA to a supported level.

------------------------------
Regards,
Mostafa Mahmoud
AIX / PowerHA / CAA / VMRM / RSCT Development Support Engineer
------------------------------

Original Message
5. RE: One of PowerHA nodes failed to start after storage migration.

Like
SHINGO NAGAI
Posted Sun February 11, 2024 10:48 PM

Reply
Hi Mostafa,
Thanks. Probably, we'll try the steps a few weeks later or so after a review and schedule arrangement for production system. Once completed, I'll post the outcome.

------------------------------
SHINGO NAGAI
------------------------------

Original Message
6. RE: One of PowerHA nodes failed to start after storage migration.

Like
SHINGO NAGAI
Posted Wed March 13, 2024 11:15 PM

Reply
We recreated the same problem on non-production system and tried the step3 in the link below, so I'd like to feedback the result.
https://www.ibm.com/support/pages/remove-powerha-systemmirror-cluster-configuration-and-rebuild-it-again

If I added a few steps as follows, it worked well. (in case node2 has problem)
- before doing step3, stop PowerHA and CAA (clmgr offline node <nodename1> STOP_CAA=yes)
- do step3
- after doing step3, delete ODM data on node2 (clmgr delete cluster NODES=<nodename2>)
- do sync from node1
- start PowerHA service (both node started successfully)

We haven't tried this on the production system yet, but I am assuming this would also work on the system.

------------------------------
SHINGO NAGAI
------------------------------

Original Message
7. RE: One of PowerHA nodes failed to start after storage migration.

Like
Michal Wiktorek

IBM Champion
Posted Mon March 11, 2024 12:16 PM

Reply
If you've migrated the storage, I think it's worth checking a few things.

- check the SCSI reservation

# lsattr -El hdiskX -a reserve_policy
CAA must have no_reselve policy. If the setting is different (e.g., single path), you must change it using the command:
# chdev -l hdiskX -a reserve_policy=no_reserve (if disk is active, you have to add -P parameter and restart system)

- check the CAA disk identifiers. In the new versions of PowerHA, it is not so important, but in the old versions like 7.1, the following are significant in cluster configuration: PVID, UUID, and the name of hdisk.

# clmgr view report repository (this command will be worked only on active node)

check PVID of disk in ODM on both nodes and compare with lspv output

# odmget HACMPsircol

Check current hdisk name, PVID and UUID name of disk on both nodes(UUID should be visible as last column)
# lspv -u

UUID of disk might have changed after storage migration, so you should compare UUID with command

# lsattr -El cluster0 -a clvdisk

If the number is diffirent than visible in lspv -u output, you can change it using the command (on both nodes)

chdev -l cluster0 -a clvdisk=NEW_UUID

After changing this, the cluster should be restarted and synchronized.

If you notice an entry in the log like the one below, you should find the previous CAA disk in the defined state and delete it, then try to resynchronize the cluster again (The disk name might still be in the cluster configuration, and if it's still visible but in the Defined status, the cluster will continue to attempt to use a non-existent disk - the old device needs to be removed so the cluster can find a new one using a different identifier)

:get_local_nodename[63] : No match - nodename must have changed or must be a new cluster.

# lsdev | grep hdisk | grep Defined
# rmdev -dl hdiskY

After this, try synchronize the cluster.

I hope you manage to quickly upgrade this cluster to a supported version and enjoy the benefits of having IBM support

Best regards,
Michal Wiktorek

------------------------
https://www.linkedin.com/in/michal-wiktorek-83b2b47b/
------------------------

------------------------------
Michal Wiktorek
------------------------------

Original Message
8. RE: One of PowerHA nodes failed to start after storage migration.

Like
SHINGO NAGAI
Posted Wed March 13, 2024 10:55 PM

Reply
Michal,

Thank you for your thoughtful advice.

I checked what you pointed out on the servers.
- SCSI reservation: Confirmed the setting is "no_reserve policy".
- PVID: Confirmed all the PVIDs are identical. (lspv, odmget, clmgr)
- UUID: Confirmed the UUIDs on both nodes are identical. (lspv, lsattr)
- log: Confirmed there is no entry like what you showed.

I agree with the opinion that it should be upgraded, but unfortunately, we cannot do that at this moment.
We manged to recreate this problem on the non-production system and made sure CAA recreation could be a solution for cluster recovery.
Thus, we consider applying this solution on the production system.

Regards,

------------------------------
SHINGO NAGAI
------------------------------

Original Message

Automation with Power

Power Business Continuity and Automation

One of PowerHA nodes failed to start after storage migration.

SHINGO NAGAIWed January 31, 2024 11:19 AM

Mostafa MahmoudWed February 07, 2024 05:36 PM

SHINGO NAGAIThu February 08, 2024 10:50 AM

Mostafa MahmoudSat February 10, 2024 11:27 AM

SHINGO NAGAISun February 11, 2024 10:48 PM

SHINGO NAGAIWed March 13, 2024 11:15 PM

Michal WiktorekMon March 11, 2024 12:16 PM

SHINGO NAGAIWed March 13, 2024 10:55 PM

1. One of PowerHA nodes failed to start after storage migration.

2. RE: One of PowerHA nodes failed to start after storage migration.

3. RE: One of PowerHA nodes failed to start after storage migration.

4. RE: One of PowerHA nodes failed to start after storage migration.

5. RE: One of PowerHA nodes failed to start after storage migration.

6. RE: One of PowerHA nodes failed to start after storage migration.

7. RE: One of PowerHA nodes failed to start after storage migration.

8. RE: One of PowerHA nodes failed to start after storage migration.

Additional
Resources

Office

Quick Links

Automation with Power

Power Business Continuity and Automation

One of PowerHA nodes failed to start after storage migration.

SHINGO NAGAIWed January 31, 2024 11:19 AM

Mostafa MahmoudWed February 07, 2024 05:36 PM

SHINGO NAGAIThu February 08, 2024 10:50 AM

Mostafa MahmoudSat February 10, 2024 11:27 AM

SHINGO NAGAISun February 11, 2024 10:48 PM

SHINGO NAGAIWed March 13, 2024 11:15 PM

Michal WiktorekMon March 11, 2024 12:16 PM

SHINGO NAGAIWed March 13, 2024 10:55 PM

1. One of PowerHA nodes failed to start after storage migration.

2. RE: One of PowerHA nodes failed to start after storage migration.

3. RE: One of PowerHA nodes failed to start after storage migration.

4. RE: One of PowerHA nodes failed to start after storage migration.

5. RE: One of PowerHA nodes failed to start after storage migration.

6. RE: One of PowerHA nodes failed to start after storage migration.

7. RE: One of PowerHA nodes failed to start after storage migration.

8. RE: One of PowerHA nodes failed to start after storage migration.

Related Content

PowerHA and ROHA

PowerHA single node "cluster" ?

PowerHA 7.1.2 two node cluster problem

powerha cluster resource move

CAA repository storage replicated Disk failed to get identified at DR site

Additional Resources

Office

Quick Links

Additional
Resources