IBM i Global

Connect, learn, share, and engage with IBM Power.

View Only

Back to discussions

Expand all | Collapse all

IBMi - PowerHA approach for OS-interventions

1. IBMi - PowerHA approach for OS-interventions

Like
Jozef Thijs
Posted Thu October 10, 2024 08:53 AM

Reply
Hello all,

I have to manage a new environment with PowerHA enabled, and I'm wondering if my the cluster operations are fine to perform some OS-interventions, like full system-backups, IPL, OS upgrades

Cluster setup :

               3 nodes in the same Device Domain

( -PS-partition

                              -PT-partition

                              -Flash Partition

               CRGs

Active - Dev CRG à pointing to IASP Device (with recovery domain set to PS & PT partition)

Inactive - Data CRG à pointing to Flashcopy node (but inactive, à using the lab-services toolkit to perform the FC operation on IASP level - QZRDHASM)

Active – Peer CRG à Application id : QZRDPWRHA.DLTPRFCLU

Now, what's the best approach to cover OS operations, like full backup, IPL, or OS upgrades ...

Which actions to deliver on Cluster/PowerHA level on the IBMi to prepare a partition for restricted state ...

Flash Partition:

Putting this Flash partition in restricted state to cover these OS related interventions ...

               On Flash partition:

                              ENDCRG of the Peer CRG

                              ENDCLUNOD NODE(FlashCopy node)

               And after the OS-intervention:

                              STRCLUNOD NODE(FlashCopy node)

                              STRCRG of the Peer CRG

PT Partition:

Putting this PT partition in restricted state to cover these OS related interventions ...

               On PT partition:

                              ENDCRG of the Peer CRG

ENDCRG of the DEV CRP (IASP)

                              ENDCLUNOD NODE(PT node)

               And after the OS-intervention:

                              STRCLUNOD NODE(PT node)

STRCRG of the DEV CRG (IASP)

                              STRCRG of the Peer CRG

                              Plus verify the replication is active and in-sync (DSPSVCSSN SSN(xxxx) DEVDMN(Dev-domain)

                              (no DETACH / REATTACH operation will be done !)

PS Partition:

Putting this PS partition in restricted state to cover these OS related interventions ... (after switching IASP to PT partition)

               On PS partition:

                              Verify the replication is active and in-sync (DSPSVCSSN SSN(xxxx) DEVDMN(Dev-domain)

                                                            And 'Switchover Reverse replication' is set to *YES

CHGCRGPRI DEV-CRG (IASP)

Production env gets opened on PT partition.

Further on on PS partition...

ENDCRG of the Peer CRG

ENDCRG of the DEV CRG (IASP)

                              ENDCLUNOD NODE(PS node)

               And after the OS- intervention:

                              STRCLUNOD NODE(PS node)

STRCRG of the DEV CRG (IASP)

                              STRCRG of the Peer CRG

                              Plus verify the replication is active and in-sync (DSPSVCSSN SSN(xxxx) DEVDMN(Dev-domain)

                              (no DETACH / REATTACH operation will be done !)

CHGCRGPRI DEV-CRG (IASP) back to PS partition

Production env gets opened on PS partition.

Is this approach fine ?

Should I include an end/start operation of the Admin Domain ?

All these commands can be executed on the related partition ?

Is my view on the cluster operations fine ... or are additional actions required.

Any recommendation is welcome.

Thanks for your feedback,

Jos

------------------------------
Jos (Jozef) Thijs
Kyndryl
------------------------------
2. RE: IBMi - PowerHA approach for OS-interventions

Like
Brian Nordland

IBM Champion
Posted Fri October 11, 2024 01:30 PM

Reply
Hi,

These are some excellent questions. Your steps look fairly similar to what I would expect. Since every environment can be unique, I always remind everyone to keep in mind that some suggestions may vary from environment to environment.

PowerHA Automated Management of the Administrative Domain

The first thing that comes to mind is regarding the Peer CRG QZRDPWRHA.DLTPRFCLU. Over the years, several aspects of the IBM services toolkit have been incorporated directly into the PowerHA product. Starting in IBM i 7.4, PowerHA now includes policies for automating the management of the administrative domain. You can find our more about these policies here: https://helpsystemswiki.atlassian.net/wiki/spaces/IWT/pages/2148040707/Administrative+Domain+PowerHA+Policies

IBM does include some documentation for cleanup: https://www.ibm.com/support/pages/node/7105341 .

PowerHA Integrated FlashCopy Automation

Similarly, the Data CRG that is used for FlashCopy management, may also be something that you could look to transition to native PowerHA commands as the PowerHA commands for FlashCopy (STRSVCSSN of type *FLASHCOPY, or using CHGSVCSSN with the *RESUME option) include the following parameters:

Source ASP action (to quiesce)

Verification of replication state if the FlashCopy is at the target of Replication

Target ASP action (to automatically vary on the FlashCopy target)

Target exit program to submit a customized exit program

Some additional information on this is available at the following page: https://helpsystemswiki.atlassian.net/wiki/spaces/IWT/pages/2269970433/7.5+HA+5.2.2+7.4+HA+4.8.2+and+7.2+HA+3.10+PTFs#Integrated-FlashCopy-Automation-Enhancements-7.5-HA-5.2.2-7.4-HA-4.8.2-7.2-HA-3.10

If you did switch to native PowerHA for both of these, that would likely simplify the environment down to the device CRG which may simplify some of your steps above.

Disabling Automatic Failover

I see in your environment you are ending the device CRG before performing maintenance, which is definitely best practice if you want to avoid accidental unplanned failovers. One other piece of protection you can put in place is to disable automatic failovers. This functionality was provided at IBM i 7.4 and previously required the lab services toolkit: https://helpsystemswiki.atlassian.net/wiki/spaces/IWT/pages/327778312/QCST_CRG_CANCEL_FAILOVER+PowerHA+policy#Example-2---Disabling-all-listed-automatic-failovers-for-a-CRG

Regarding your final question of the administrative domain, typically, I recommend leaving the administrative domain active, even when doing maintenance on one of the nodes unless you're doing something where you know you're going to want to discard changes on a particular node.

Hopefully, this information is useful.

------------------------------
Thanks,
Brian Nordland
Director of Development at Fortra
------------------------------

Original Message
3. RE: IBMi - PowerHA approach for OS-interventions

Like
Ben Rabe
Posted Fri October 11, 2024 02:41 PM

Reply
Hi Jos,

I think what you have laid out is "fine" but there is also a level of "it depends" with your environment.

Depending on the level of IBMi OS you're operating in, the mention of QZRDPWRHA may no longer be in service....that's a side topic.

As for full system backups, again depending on if you're wanting to incorporate IASP data into that backup will determine certain actions.

Ultimately if you're replicating between PS and PT and then taking a flashcopy, is the flashcopy occurring from PS or PT as the flashcopy source?
Regardless, as long as you're flashing the IASP data, then the entire system save of the flash node is a good recovery point of the IASP data, valid as of the time of the flashcopy. LIC and OS are subjective depending on how your role out PTFs.

When saving the PT, are you wanting IASP data saved also? If so, then you'd have to plan for a "Detach" of the replication between PS and PT before the backup occurs.

When saving the PS, is there really a need short of needing to recover SYSBAS? If you're flashing your critical IASP data to a flashcopy node, then saving that IASP data is probably the most critical aspect. The SYSBAS portion becomes less of a concern in a recovery effort since you could really pull SYSBAS from the flash node or the PT node at any point.

Every environment is different and customer needs are different. Everything all depends on the the ultimate end goal on what you're wanting to achieve for each system/node in the environment.

To finish answering your questions, as long as there is 1 node active in the cluster/admin domain, there is no need to ENDCAD/STRCAD from the node performing the backup.

I'm more than happy to try and provide further feedback if you have additional specific questions.

Ben Rabe

------------------------------
Ben Rabe
------------------------------

Original Message
4. RE: IBMi - PowerHA approach for OS-interventions

Like
Jozef Thijs
Posted Wed October 30, 2024 07:06 AM

Reply
Brian and Ben, thanks for your valuable feedback.

Another question ... related to the switching of the iASP ...

In this environment, I see 10000 (and more) devices which are not reported anymore in SST. In fact, disk units (DMPxxxx) unknown. I can perform a manual cleanup of all these 'failed / non-reporting devices in Service tools ... but is there no program or command available which I can run regularly to perform a nice cleanup in SST ?

Thanks,

Kind regards,

Jos

------------------------------
Jos (Jozef) Thijs
Kyndryl Belgium
------------------------------

Original Message
5. RE: IBMi - PowerHA approach for OS-interventions

Like
Ben Rabe
Posted Wed October 30, 2024 12:54 PM

Reply
Hello Jozef, there is no command in the base/native IBMi OS to perform this type of clean up easily at this time.
However, IBM Technology Expert Labs offers a tool library referred to as Smart Assist (library QZRDPWRHA).
In their tool set, they have a command called QZRDPWRHA/CLRHDWRSC with the following Text Description "Clear non-reporting resources."

Additionally, there is an Advanced Analysis macro -> iohridebug -removenonreportingdisks that can be used instead of having to do the manual removal from Hardware Service Manager.

Hope this helps a little bit.

------------------------------
Ben Rabe
------------------------------

Original Message
6. RE: IBMi - PowerHA approach for OS-interventions

Like
Marc Rauzier
Posted Wed October 30, 2024 02:33 PM

Reply
Hello Jos

You may want to try QMGTOOLS/RUNAA command (https://www.ibm.com/support/pages/qmgtools-run-aa-macros). There is an *IOHRIDEBUG value for REQUEST parameter. With -removenonreportingdisks for DATA parameter, it might perform what you are looking for.

------------------------------
Marc Rauzier
------------------------------

Original Message
7. RE: IBMi - PowerHA approach for OS-interventions

Like
Ben Rabe
Posted Wed October 30, 2024 02:55 PM

Reply
@Marc, that's a good idea.....but:
QMGTOOLS/RUNAA RQS(*IOHRIDEBUG) DATA('-removenonreportingdisks')
Requested advanced analysis command IOHRIDEBUG is not supported.

Seems we do not allow that to be performed at this time.

------------------------------
Ben Rabe
------------------------------

Original Message
8. RE: IBMi - PowerHA approach for OS-interventions

Like
Ben Rabe
Posted Wed October 30, 2024 02:59 PM

Reply
I should add that QMGTOOLS/RUNAA macros are meant to be used for data collection and not performing tasks to the system, such as removing failed/non-reporting resources.
We could consider in the future having some option that could do this, or the IDEA could be raised to request an OS command to run this interactively from a command line.
https://ibm-data-and-ai.ideas.ibm.com/

------------------------------
Ben Rabe
------------------------------

Original Message
9. RE: IBMi - PowerHA approach for OS-interventions

Like
Marc Rauzier
Posted Thu October 31, 2024 05:19 AM

Reply
That's a good idea to raise an IDEA! At the same time, it could be interesting to request an OS command to reset disks multipaths. I remember loosing tons of time to do it manually in the past. I am retired now for a couple of years but, in the Kyndryl (or IBM SO/ITD/Infrastructure Services) context, using an SST user profile involves connecting to a shared password vault, requesting the password with a valid reason, connecting to SST, changing the password in both vault and SST, then run the multipath resetter macro. And if your OS profile does not have *SERVICE special authority, you have to perform this procedure for another profile (such as QSECOFR) which has it. Hopefully, they found now a way to change this password automatically from the vault.

PS: wasn't there a RUNAA2 command in the past with more capabilities than RUNAA? Or, is my memory getting in trouble?

(Jos, I hope that everything is running fine for you and the IBM i team within Kyndryl)

------------------------------
Marc Rauzier
------------------------------

Original Message
10. RE: IBMi - PowerHA approach for OS-interventions

Like
Ben Rabe
Posted Thu October 31, 2024 09:12 AM

Reply
Marc, your memory does serve you well and there is a RUNAA2. However, in order to run that you still have to qualify with an OS user id/pw/confirm pw as well as SST user id/pw/confirm pw.
But I did test that too and it did allow me to run the macro.

> QMGTOOLS/RUNAA2 OS400USR(BRABE) OS400PWD() OS400PWD2() SSTUSR(QSECOFR) SSTPWD() SSTPWD2() AAMACRO1(IOHRIDEBUG) AAOPT1('-removenonreportingdisks')
Request completed

Upon completion I went out and checked and all of my failed/non-reporting resources were cleaned up and the only things left behind were resources with an Unknown status.

As for your other issue with running Multipathresetter, that's an entirely different macro and I will just say that it depends on why you had to run it as well as what technologies are being used, but that macro is already called by default in PowerHA as needed and by PowerHA Tools.
PowerHA tools also offers a command to run a reset from command line....and also, if there were a valid need to run it from a native OS command, an IDEA submission would be the next logical request.

------------------------------
Ben Rabe
------------------------------

Original Message
11. RE: IBMi - PowerHA approach for OS-interventions

Like
Marc Rauzier
Posted Thu October 31, 2024 10:27 AM

Reply
Thanks for confirming and testing RUNAA2. Maybe, Jos will have the opportunity to use it (in conjunction with CHGSSTUSR command to change the password, as it is required to do so within Kyndryl, every time a shared password is used).

Regarding the need to reset disks multipaths, I have again to call my memory!
I remember now that, thanks to PowerHA Tools that we set up a couple of years before my retirement for SVC like storage devices, we could leverage the reset command they provide. But during our first tests then subsequent production installs, we were using an homemaiden tool to provide similar functions for FullSystem FlashCopy and FullSystem Replication with DS8k devices. This tool is as old as V5R4M5 and was still in use until V7R3. And at each IPL following a FlashCopy or a role swap after a fail over pprc, we were stuck with those C6004508/C600450A SRC codes during 10-15 minutes, hoping that everything was fine. Only a multipath reset as soon as possible after this first IPL was removing this long IPL step. Even further IPLs without multipath reset were not removing it. The fact was that there was always the same number of paths for each disk on the target than on the source, but my understanding was that the IPL step was not happy with the resources serial number or any hardware difference. And so, it was waiting for those it had at previous IPL (on source system) to come back, until it timed out, agreeing finally those it had.

Sorry for this long post but, at least, my memory had some work today :-)

------------------------------
Marc Rauzier
------------------------------

Original Message
12. RE: IBMi - PowerHA approach for OS-interventions

Like
Glen Corneau
Posted Fri November 01, 2024 03:45 PM

Reply
Don't think it makes a difference, but I use the Power link for Power-related Ideas: https://ibm-power-systems.ideas.ibm.com/

------------------------------
Glen Corneau
------------------------------

Original Message
13. RE: IBMi - PowerHA approach for OS-interventions

Like
Marc Rauzier
Posted Thu October 31, 2024 05:07 AM

Reply
Hello Ben
Bad luck and thanks for checking.

------------------------------
Marc Rauzier
------------------------------

Original Message
14. RE: IBMi - PowerHA approach for OS-interventions

Like
Jozef Thijs
Posted Fri November 08, 2024 10:45 AM

Reply
Marc, Ben,

Thanks very much for this feedback.

In fact, I have now 3 options :

within SST, using the Advanced Analysis macro IOHRIDEBUG -removenonreportingdisks

using this IOHRIDEBUG macro through the RUNAA2 command in qmgtools.

Or use the command CLRHDWRSC in the smartassist library QZRDPWRHA

All 3 options working fine.

Thx very much

Jos

------------------------------
Jos (Jozef) Thijs
Kyndryl Belgium
------------------------------

Original Message

IBM i Global

IBM i Global

IBMi - PowerHA approach for OS-interventions

Jozef ThijsThu October 10, 2024 08:53 AM

Brian NordlandFri October 11, 2024 01:30 PM

Ben RabeFri October 11, 2024 02:41 PM

Jozef ThijsWed October 30, 2024 07:06 AM

Ben RabeWed October 30, 2024 12:54 PM

Marc RauzierWed October 30, 2024 02:33 PM

Ben RabeWed October 30, 2024 02:55 PM

Ben RabeWed October 30, 2024 02:59 PM

Marc RauzierThu October 31, 2024 05:19 AM

Ben RabeThu October 31, 2024 09:12 AM

Marc RauzierThu October 31, 2024 10:27 AM

Glen CorneauFri November 01, 2024 03:45 PM

Marc RauzierThu October 31, 2024 05:07 AM

Jozef ThijsFri November 08, 2024 10:45 AM

1. IBMi - PowerHA approach for OS-interventions

2. RE: IBMi - PowerHA approach for OS-interventions

3. RE: IBMi - PowerHA approach for OS-interventions

4. RE: IBMi - PowerHA approach for OS-interventions

5. RE: IBMi - PowerHA approach for OS-interventions

6. RE: IBMi - PowerHA approach for OS-interventions

7. RE: IBMi - PowerHA approach for OS-interventions

8. RE: IBMi - PowerHA approach for OS-interventions

9. RE: IBMi - PowerHA approach for OS-interventions

10. RE: IBMi - PowerHA approach for OS-interventions

11. RE: IBMi - PowerHA approach for OS-interventions

12. RE: IBMi - PowerHA approach for OS-interventions

13. RE: IBMi - PowerHA approach for OS-interventions

14. RE: IBMi - PowerHA approach for OS-interventions

Additional
Resources

Office

Quick Links

IBM i Global

IBM i Global

IBMi - PowerHA approach for OS-interventions

Jozef ThijsThu October 10, 2024 08:53 AM

Brian NordlandFri October 11, 2024 01:30 PM

Ben RabeFri October 11, 2024 02:41 PM

Jozef ThijsWed October 30, 2024 07:06 AM

Ben RabeWed October 30, 2024 12:54 PM

Marc RauzierWed October 30, 2024 02:33 PM

Ben RabeWed October 30, 2024 02:55 PM

Ben RabeWed October 30, 2024 02:59 PM

Marc RauzierThu October 31, 2024 05:19 AM

Ben RabeThu October 31, 2024 09:12 AM

Marc RauzierThu October 31, 2024 10:27 AM

Glen CorneauFri November 01, 2024 03:45 PM

Marc RauzierThu October 31, 2024 05:07 AM

Jozef ThijsFri November 08, 2024 10:45 AM

1. IBMi - PowerHA approach for OS-interventions

2. RE: IBMi - PowerHA approach for OS-interventions

3. RE: IBMi - PowerHA approach for OS-interventions

4. RE: IBMi - PowerHA approach for OS-interventions

5. RE: IBMi - PowerHA approach for OS-interventions

6. RE: IBMi - PowerHA approach for OS-interventions

7. RE: IBMi - PowerHA approach for OS-interventions

8. RE: IBMi - PowerHA approach for OS-interventions

9. RE: IBMi - PowerHA approach for OS-interventions

10. RE: IBMi - PowerHA approach for OS-interventions

11. RE: IBMi - PowerHA approach for OS-interventions

12. RE: IBMi - PowerHA approach for OS-interventions

13. RE: IBMi - PowerHA approach for OS-interventions

14. RE: IBMi - PowerHA approach for OS-interventions

Additional Resources

Office

Quick Links

Additional
Resources