PowerHA for AIX

 View Only
  • 1.  PowerHA 7.1.2 two node cluster problem

    Posted Wed December 01, 2021 09:01 AM
    Hi,

    We have PowerHA 7.1.2 two-node cluster with dh NodeA and NodeB. We had some HW problems and lost the secondary node NodeB. The cluster configuration have two resource groups one for the application (Informix) and second for disk heartbeat. So in attempt to recover the NodeB, we took a sysback from NodeA and restore it, after we overcome our HW issue. We change the hostnames, IP addresses, rhost files, mount the existing disks and volume groups, but we cant sync and verify the cluster communication. Usually we get comm error during sync and verify.

    migcheck[490]: read() error, nodename=NodeA, rc=0
    migcheck[490]: read() error, nodename=NodeB, rc=0

    WARNING: A communication error was encountered trying to get the VRMF from remote nodes. Please make sure clcomd is running

    Verifying Cluster Configuration Prior to Starting Cluster Services.

    There are no active cluster nodes to verify against.
    Verifying node(s): IDSa requested to start

    ERROR: Comm error found on node: NodeA.
    ERROR: Comm error found on node: NodeB.
    ERROR: No more alive nodes left

    Cluster services and RG are ONLINE on NodeA. During verbose sync and verify we notice that NodeB cant resolve NodeA.

    Verifying that all IP labels resolve to correct IP address on all nodes.
    Node: NodeA
    IDS () PASS

    NodeA () FAIL

    NodeAHB () FAIL

    NodeB () PASS

    NodeBHB () PASS

    I suspect there is something left from the sysback from the NodeA, maybe in the ODM database, but I can't found out for sure.

    Every comment, help, and solution is welcome. I really need to get this cluster up and running.

    BR,
    Ivan Efremovski

    ------------------------------
    Ivan Efremovski
    ------------------------------


  • 2.  RE: PowerHA 7.1.2 two node cluster problem

    Posted Wed December 01, 2021 09:12 AM
    On Wed, Dec 01, 2021 at 02:01:19PM +0000, Ivan Efremovski via IBM Community wrote:
    > So in attempt to recover > the NodeB, we took a sysback from NodeA
    > and restore it, after we overcome our HW issue. We change the
    > hostnames, IP addresses, rhost files, mount the existing disks and
    > volume groups, but we cant sync and verify the cluster
    > communication. Usually we get comm error during sync and verify.

    There are strong warnings about not cloning PowerHA nodes in the
    documentation because UUIDs can be copied across nodes. When I build
    clusters, I always clone the OS before installing PowerHA.

    Consider uninstalling PowerHA on the NodeB, reinstalling PowerHA on
    NodeB, and then add it to the cluster.

    Do not remove the cluster on NodeB as it is today, and it might impact
    NodeA as well. I'd consider taking NodeB off the network and SAN until
    you've corrected PowerHA.

    ------------------------------------------------------------------
    Russell Adams Russell.Adams@AdamsSystems.nl
    Principal Consultant Adams Systems Consultancy
    http://adamssystems.nl/




  • 3.  RE: PowerHA 7.1.2 two node cluster problem

    Posted Wed December 01, 2021 02:18 PM

    Thank you for your answer.

    First to explain the architecture. We have two separate Power p740 machines, PowerVM installed and configured. NodeA and NodeB LPAR's were located on separate physical machines. rootvg is installed on the local vio, presented to the LPAR's via vscsi. With the HW problem on the second physical machine, we lost all rootvg and vio data and we had to rebuild it from scratch. The resource and hb disks, are presented from the SAN (EMC Unity 400). Additionally, we didn't have backups from the lost LPARs, for example, NodeB.

    There are no duplicate UUIDs, but the HACMPcluster ODM entry on the restored NodeB, shows NodeA listed as nodename. Cluster is stable on the NodeA and all resource groups are online.

    I have concerns about removing the NodeB from the cluster because I think it will remove something from the active NodeA and destroy my cluster. Maybe I should isolate NodeB from the network and start from there.

    All HA and OS software versions are identical.

    Please advice.

    BR,
    Ivan



    ------------------------------
    Ivan Efremovski
    ------------------------------



  • 4.  RE: PowerHA 7.1.2 two node cluster problem

    Posted Wed December 01, 2021 03:17 PM
    On Wed, Dec 01, 2021 at 07:18:15PM +0000, Ivan Efremovski via IBM Community wrote:
    > First to explain the architecture. We have two separate Power p740
    > machines, PowerVM installed and configured. NodeA and NodeB LPAR's
    > were located on separate physical machines. rootvg is installed on
    > the local vio, presented to the LPAR's via vscsi. With the HW
    > problem on the second physical machine, we lost all rootvg and vio
    > data and we had to rebuild it from scratch. The resource and hb
    > disks, are presented from the SAN (EMC Unity 400). Additionally, we
    > didn't have backups from the lost LPARs, for example, NodeB.

    That's unfortunate.

    > There are no duplicate UUIDs, but the HACMPcluster ODM entry on the
    > restored NodeB, shows NodeA listed as nodename. Cluster is stable on
    > the NodeA and all resource groups are online.

    If NodeB is listed as NodeA, then the CAA UUIDs were duplicated. You
    need to fully reset NodeB before adding it to the cluster.

    I'm using UUID as a general term, there are many. CAA, RSCT, OS, etc.

    > I have concerns about removing the NodeB from the cluster because I
    > think it will remove something from the active NodeA and destroy my
    > cluster. Maybe I should isolate NodeB from the network and start
    > from there.

    If the cluster is already confused due to trying to integrate NodeB,
    you may need to open a support ticket.

    > All HA and OS software versions are identical.

    Uninstalling HA used to reset all the PowerHA identifiers. That's why
    uninstall then reinstall was useful for fixing a cloned node.

    If it were me, I'd get an outage and bring down the cluster. I'd do a
    cluster snapshot, take good backups, and then uninstall PowerHA. Then
    I'd reinstall it, and then create a new cluster between the
    nodes. Then I would consider whether it would be faster to restore the
    snapshot or just recreate the cluster definitions by hand.


    ------------------------------------------------------------------
    Russell Adams Russell.Adams@AdamsSystems.nl
    Principal Consultant Adams Systems Consultancy
    http://adamssystems.nl/




  • 5.  RE: PowerHA 7.1.2 two node cluster problem

    Posted Thu December 02, 2021 09:52 AM

    So I will try to create AP, based on my research and your insight/recommendations.

    1. Schedule maintenance window.
    2. Shutdown the NodeB (sysback image from NodeA).
    3. Bring the resources Offline and take a new sysback image node NodeA.
    4. Take snapshot cluster configuration backup.
    5. Clone or take snapshots from the SAN disks.
    6. Remove the NodeB node, from the smit hacmp NodeA. (not sure this step will work, or I need to force the removal)
    7. Power on the NodeB and uninstall all HA software from it.
    8. Install the HA software and add the node NodeB back to the cluster. Same configuration from before.
    9. Hope for the best :slight_smile:

    Every comment, help, and solution is welcome. I really need to get this cluster up and running.

    BR,
    Ivan



    ------------------------------
    Ivan Efremovski
    ------------------------------



  • 6.  RE: PowerHA 7.1.2 two node cluster problem

    IBM Champion
    Posted Wed December 01, 2021 03:14 PM
    There are updated docs online about handling cloned AIX systems and PowerHA.  Google "cloning aix powerha".  Top 2 are 2021 docs.

    Not entirely a separate topic, but related - PowerHA 7.2.6 comes out this month. In fact, I consider 7.1.3 to be the minimally acceptable 7.1 version.  I'd be scared of running production on 7.1.2 (but seriously ... go to 7.2).

    Kevin