PowerHA for AIX

 View Only

PowerHA and ROHA

By Michal Wiktorek posted Fri July 26, 2024 09:06 PM

  

Introduction

Providing a High Availability (HA) solution for running an application requires creating an appropriate environment so that in the event of a failure or planned downtime, it can be launched on other virtual/hardware resources (putting it very simply, of course). IBM PowerHA SystemMirror clusters (referring to PowerHA for AIX) primarily allow resources to be switched/shared between cluster nodes, but different resources require different approaches. Some of them are purely software-based, while others are closely tied to hardware. There are many types of applications, configurations, and possible scenarios for minimizing their downtime. However, for the purposes of this article, I would like to refer to the classic case where the application needs to operate with:

  • network communication (transferable IP address)
  • storage resources
  • operating system (each cluster node is a separate AIX system)
  • scripts to start/stop the application
  • RAM
  • CPU

Resources such as IP address and access to Volume Group/Filesystems are configurable at the operating system level, but the amount of CPU and RAM requires intervention in the LPAR configuration, usually via the HMC console.

In an ideal world, all cluster nodes have the same amount of CPU cores and RAM assigned, and software licenses are not a concern.

In the real world, it happens that cluster nodes, for various reasons, do not have the same resources, and software licenses depend on the number of CPUs assigned to each cluster node where the software binaries are located. License costs can be enormous, so in different companies and environments, standby nodes often have significantly limited CPU and RAM resources compared to the primary node. 

What are the consequences?

The customer has paid for a PowerHA SystemMirror product license to have high availability and automated application failover, so theoretically, in case of primary node failure, the application should automatically start on one of the standby nodes - after all, that's what you pay for with PowerHA. 

In practice, the cluster will switch resources to the backup node, start the application... and in the best case, it will run on minimal resources with potential performance degradation, and in the worst case, it may not start at all. Unfortunately, in various situations, starting the application may wait for an administrator to equip the backup node with resources (possibly also activating CoD codes) and, depending on the application/configuration, a restart may be required for it to work with the new amount of resources. As a result, we are dealing with a solution that does not meet the main assumptions, but this is not the fault of the product itself, but rather savings, business decisions, or simply oversight.

The solution to such a situation is to automatically set the required resources for the backup node in case of cluster failover.

Dedicated scripts/playbooks could be written for this purpose, but PowerHA already has a ready-made functionality for this purpose, namely ROHA.

What is ROHA?

ROHA (Resource Optimized High Availability) is a functionality of PowerHA clusters that allows dynamic allocation and deallocation of resources such as CPU and RAM, assigned to the Application Controller for a Resource Group. Documentation for ROHA is available on the IBM website: https://www.ibm.com/docs/en/powerha-aix/7.2?topic=cluster-resources-optimized-high-availability

When a PowerHA cluster moves or starts/stops a Resource Group, it connects to the HMC console managing the given LPARs (cluster nodes) via SSH or REST API and dynamically allocates the LPAR's resources.

In this case, the CPU/RAM resources of the application running within PowerHA are allocated within the settings of the Application Controller, which is assigned to the Resource Group.

Advantages of ROHA

  • Automates the management of resources assigned to LPARs that are PowerHA cluster nodes, thus reducing the need for administrator intervention during cluster failover.
  • Allows for the reduction of required CPU and RAM resources on the Power platform.
  • Can significantly reduce the number of required licenses for applications/databases.
  • Enables activation/deactivation of CPU/RAM resources within CoD and Power Enterprise Pool.

Disadvantages of ROHA

  • The need to ensure that LPARs are assigned to the appropriate HMC consoles in the ROHA configuration (this can be challenging in large environment)
  • Maintaining accounts and permissions on HMC consoles can be problematic, especially in the case of LPM migration.
Personally, I find this functionality extremely useful, but to fully understand how ROHA works, the available documentation was not sufficient for me. Understanding some of the operational assumptions of ROHA required me to conduct tests and analyze system logs. I did not find certain information in the documentation, so I decided to share it in this article.

Resource Calculation

I believe that many administrators might be slightly surprised during testing to find that after switching the Resource Group, the target LPAR did not receive exactly the resources set in the ROHA configuration. Why is that?

Although I did not find a definitive explanation on IBM's website, the calculation method can be inferred by tracking the cluster log hacmp.out, or looking directly into /usr/es/sbin/cluster/events/utils/clcomputeroha script, which is used by PowerHA.

It's worth noting that the calculation method may vary depending on the PowerHA cluster version. In this text, I refer to version 7.2.8.

In the configuration of the Application Controller, you may notice that there is an option "Use desired level from the LPAR profile".

If the option is set to No (by default), then the MINIMUM value from the profile is considered.

                                                      [Entry Fields]
Application Controller Name                         cl_app1_script

Use desired level from the LPAR profile             No                
Optimal amount of gigabytes of memory              [30.0]

Optimal number of dedicated processors             [0]                

Optimal number of processing units                 [0.20]
Optimal number of virtual processors               [2]         

Those familiar with AIX systems or the PowerVM virtualization platform likely know that an LPAR profile, besides the currently assigned CPU and RAM resources, also has properties such as MINIMUM:

  • Minimum Processing Units (Entitled Capacity, used in the case of Shared processors)
  • Minimum Processors (Dedicated or Shared)
  • Minimum Memory

These are the minimal resources required for an LPAR to be started. In practice, these values are rarely modified, and administrators don't usually pay much attention to them.

Consideration of the MINIMUM parameters of an LPAR in the ROHA resource calculation, makes sense in my view. In my interpretation, MINIMUM is the value that should suffice for the operating system to run (including the load generated by monitoring agents, security scanning, automation, backup, etc.).

To the amount of resources needed by the operating system, when the Resource Group and Application Server are started, the resources required by the application running within the cluster are added.

For example, if the application "XYZ" requires 10 GB of RAM, the resources of the LPAR on which it will run will not be exactly 10 GB, but 10 GB + RAM for the AIX operating system (i.e., the MINIMUM value from the LPAR profile).

What if the currently assigned resources of the cluster node where the Resource Group is to be started are exactly equal to or more than those specified in the ROHA configuration?

I would like to answer this question by presenting the following examples.

Calculation Examples for Different Scenarios

  • Case when the sum of the values for the application and the minimum values of the LPAR profile (APP + MINIMUM_PROFILE) are greater than the currently assigned resources to the LPAR:
    Resources will be dynamically allocated.

Type

MINIMUM values in the LPAR profile

ALLOCATED values (currently assigned)

Values in the ROHA configuration for the Application Controller (APP)

Result - Compute ROHA RAM/CPU

Entitled Capacity

0.1

1

1.6

1.7 EC

Virtual Processor

1

2

4

5 vCPU

Memory

6 GB

8

30 GB

36 GB

  • Case when the sum of the values for the application and the minimum values of the LPAR profile (APP + MINIMUM_PROFILE) are greater than the currently assigned resources to the LPAR:

    Resources will be dynamically allocated.


Type

MINIMUM values in the LPAR profile

ALLOCATED values (currently assigned)

Values in the ROHA configuration for the Application Controller (APP)

Result - Compute ROHA RAM/CPU

Entitled Capacity

1.0

1.0

0.1

1.1 EC

Virtual Processor

4

4

1

5 vCPU

Memory

20 GB

21 GB

3 GB

23 GB

  • Case when the resources currently assigned to the LPAR are greater than MINIMUM_PROFILE + APP:

    The currently assigned resources are greater, so no additional resources are allocated to the LPAR.


Type

MINIMUM values in the LPAR profile

ALLOCATED values (currently assigned)

Values in the ROHA configuration for the Application Controller (APP)

Result - Compute ROHA RAM/CPU

Entitled Capacity

0.1

0.3

0.1

0.3 EC

Virtual Processor

1

3

1

3 vCPU

Memory

6 GB

45 GB

10 GB

45 GB

It's important to note that after moving the Resource Group to a standby cluster node, resources are returned (if technically possible).

The cluster can have multiple Resource Groups, so resources are calculated separately to meet the requirements of each application.

A crucial aspect of the ROHA configuration is whether the application should have the ability to start if the resources on the physical server are insufficient.

I believe this decision should be made individually, considering the specifics of the system in question.

From my perspective, I suggest that a situation where the application automatically starts on the backup cluster node due to a primary node failure, even without the allocation of resources (e.g., due to an HMC connection error), might often be beneficial. If the application can run correctly on minimal resources, it can minimize downtime for the business, and the administrator can later manually adjust the allocation of appropriate resources. Of course, this depends on many factors, so it may not always be the best option.

General Assumptions

  • The MINIMUM values from the LPAR profile are considered as the operating system resources, so they are treated independently of the Application Controller.
  • When starting or switching the Resource Group, the LPAR will have resources assigned in the amount of the Application Controller resources added to the minimum values from the profile, unless the currently assigned resources of the LPAR are higher.
  • After switching the Resource Group to another node or stopping it, the resources will be released.
  • In the case of multiple Resource Groups, resources are treated independently, meaning that even if resources have already been added to the LPAR for the first Resource Group, additional resources will be added with subsequent Resource Groups, but without duplicating the MINIMUM profile resources.

These assumptions are based on my observations and tests, and therefore may not be described in the IBM documentation.
Of course, all information in the documentation should be treated as primary.

Reporting and Analysis of ROHA Operations

Below, I have included sample outputs of useful commands to facilitate understanding their usage and the logic of ROHA.

All names provided are fictional for the purposes of this article and do not refer to any real IT system.

Complete report for ROHA and the resources of LPARs and physical servers:

# clmgr view report ROHA
Cluster: 'CLUSTER_NAME' of Stretched Cluster type
        Cluster tunables
                Dynamic LPAR
                  Always Start Resource Groups: '1'
                  Adjust Shared Processor Pool size if required: '0'
                  Force synchronous release of DLPAR resources: '0'
                On/Off CoD
                  I agree to use On/Off CoD and be billed for extra costs: '0'
                  Number of activating days for On/Off CoD requests: '30'
                Enterprise Pool
                  Resource Allocation order: '0'
        Node: nodename1
                Site: DC1
                HMC(s): hmc_name_1
                Managed system: Power10_Machine_XXX
                LPAR: nodename1_lpar
                NovaLink(s):
                        Current profile: 'default_profile'
                        Memory (GB):        minimum '6'  desired '6'  current '76'  maximum '120'
                        Processing mode: Shared
                        Shared processor pool: 'SSP_TEST'
                        Processing units:   minimum '0.1'  desired '0.1'  current '0.5'  maximum '10.0'
                        Virtual processors: minimum '1'  desired '1'  current '5'  maximum '10'
                ROHA provisioning for 'ONLINE' resource groups
                        Resource group: 'rg_database'  Application controller: 'db_start_stop_script'
                                Memory='70.0' Processors='0' Processing units='0.40' Virtual Processors='4'
                        Total: Use desired number='0' Memory='70' Processors='0' Processing units='0.4' Virtual Processors='4'
                ROHA provisioning for 'OFFLINE' resource groups
                        Resource group: 'rg_app3'  Application controller: 'app3_test_app'
                                Memory='10.0' Processors='0' Processing units='1.00' Virtual Processors='7'
                        Resource group: 'rg_app1'  Application controller: 'cl_app1_script'
                                Memory='10.0' Processors='0' Processing units='0.10' Virtual Processors='1'
                        Total: Use desired number='0' Memory='20' Processors='0' Processing units='1.1' Virtual Processors='8'
        Node: nodename2
                Site: DC2
                HMC(s): hmc_name_2
                Managed system: Power10_Machine_XXX
                LPAR: node_name_2_lpar
                NovaLink(s):
                        Current profile: 'default_profile'
                        Memory (GB):        minimum '6'  desired '30'  current '30'  maximum '120'
                        Processing mode: Shared
                        Shared processor pool: 'SSP_TEST'
                        Processing units:   minimum '0.1'  desired '0.1'  current '0.1'  maximum '10.0'
                        Virtual processors: minimum '1'  desired '1'  current '1'  maximum '10'
                ROHA provisioning for 'ONLINE' resource groups
                        Resource group: 'rg_db2'  Application controller: 'db2_start_stop_script'
                                Memory='24.0' Processors='0' Processing units='0.00' Virtual Processors='0'
                        Total: Use desired number='0' Memory='24' Processors='0' Processing units='0' Virtual Processors='0'
                ROHA provisioning for 'OFFLINE' resource groups
                        Resource group: 'rg_app2'  Application controller: 'cl_app2_script'
                                Memory='10.0' Processors='0' Processing units='0.10' Virtual Processors='1'
                        Total: Use desired number='0' Memory='10' Processors='0' Processing units='0.1' Virtual Processors='1'

Managed System 'Power10_Machine_XXX'
        Hardware resources of managed system
                Installed:       memory '1280' GB         processing units '20.0000'
                Configurable:    memory '1280' GB         processing units '20.0000'
                Inactive:        memory '0' GB    processing units '0.0000'
                Deconfigured:    memory '0' GB   processing units '0.0000'
                Available:       memory '945.25' GB       processing units '10.5000'
                Free:            memory '945.25' GB       processing units '10.5000'
        On/Off CoD
                No
        Enterprise pool
                No
        Hardware Management Console
                hmc_name_1
        Shared processor pool 'SSP_TEST'
                Available: '4.1'
                Reserved: '0.0'
                Maximum: '5.0'
        Logical partition 'node_name_1_lpar'
                This 'node_name_1_lpar' partition hosts 'node_name_1' node of the Stretched Cluster 'CLUSTER_NAME'
        Logical partition 'node_name_2_lpar'
                This 'node_name_2_lpar' partition hosts 'node_name_2' node of the Stretched Cluster 'CLUSTER_NAME'
No enterprise pool defined.

Currently assigned resources on a node within ROHA

# clodmget HACMPdynresop

#  clodmget HACMPdynresop
:key="nodename1_LPAR_NAME":value="nodename1_lpar":
:key="nodename1_MANAGED_SYSTEM":value="XXX":
:key="nodename1_LPAR_NAME":value="nodename1_lpar":
:key="nodename1_MANAGED_SYSTEM":value="XXX":
:key="TIMESTAMP":value="Thu Jun  1 14:09:42 CEST 2023":
:key="MANAGED_SYSTEM":value="":
:key="ENTERPRISE_POOL":value="":
:key="PREFERRED_HMC_LIST":value="":
:key="PREFERRED_NOVA_LIST":value="":
:key="DLPAR_MEM":value="70":
:key="DLPAR_PROCS":value="4":
:key="DLPAR_PROC_UNITS":value="0.4000":
:key="MAX_SPP_DIFF":value="0":
:key="ONOFF_MEM":value="":
:key="ONOFF_CPU":value="":
:key="CODPOOL_MEM":value="":
:key="CODPOOL_CPU":value="":

Viewing configured HMC consoles

# clmgr query hmc -v

# clmgr query hmc -v
NAME="HMCNAME_1"
TIMEOUT="-1"
RETRY_COUNT="-1"
RETRY_DELAY="-1"
NODES="nodename_1"
SITES="DC1"
STATUS="UP"
VERSION="VXXRX.XXXX.X"
USER_NAME="username"
PASSWORD=""

NAME="HMCNAME_2"
TIMEOUT="-1"
RETRY_COUNT="-1"
RETRY_DELAY="-1"
NODES="nodename_1 nodename_2"
SITES="DC2"
STATUS="UP"
VERSION="VXXRX.XXXX.X"
USER_NAME="username"
PASSWORD=""

Viewing current settings for an Application Controller

# smitty cm_cfg_get_hrp → commit CoD settings (No) → choose Application Controller

Application Controller Name                         cl_app1_script

Use desired level from the LPAR profile             No              
Optimal amount of gigabytes of memory              [10.0]

Optimal number of dedicated processors             [0]              

Optimal number of processing units                 [0.10]
Optimal number of virtual processors               [1]              

Listing minimum values of the LPAR profile

# lparstat -i | grep Min
Minimum Virtual CPUs                       : 1
Minimum Memory                             : 6144 MB
Minimum Capacity                           : 0.10

Analysis of ROHA Logs

Reviewing the hacmp.out log is very useful, especially during cluster failover and when there are doubts about why resources were calculated in a particular way.

# grep ROHALOG /var/hacmp/log/hacmp.out

# grep ROHALOG /var/hacmp/log/hacmp.out
[ROHALOG:6357464:(0.073)] Open session 6357464 at Thu Jun  1 14:03:18 CEST 2023
[ROHALOG:6357464:(6.230)] ==== HACMProhaparam ODM ====
[ROHALOG:6357464:(6.241)] ALWAYS_START_RG    = 1
[ROHALOG:6357464:(6.252)] FORCE_SYNC_RELEASE = 0
[ROHALOG:6357464:(6.262)] ADJUST_SPP_SIZE    = 0
[ROHALOG:6357464:(6.272)] AGREE_TO_COD_COSTS = 0
[ROHALOG:6357464:(6.281)] ONOFF_DAYS         = 30
[ROHALOG:6357464:(6.292)] RESOURCE_ALLOCATION_ORDER = 0
[ROHALOG:6357464:(6.297)] ============================
[ROHALOG:6357464:(6.301)] ===== HACMPdynresop ODM ====
[ROHALOG:6357464:(6.312)] TIMESTAMP             = Thu Jun 1 14:03:24 CEST 2023
[ROHALOG:6357464:(6.322)] MANAGED_SYSTEM        = MACHINE_01_NAME
[ROHALOG:6357464:(6.331)] ENTERPRISE_POOL       = unknown
[ROHALOG:6357464:(6.342)] PREFERRED_HMC_LIST    = hmc_test_01
[ROHALOG:6357464:(6.352)] PREFERRED_NOVA_LIST   = 0
[ROHALOG:6357464:(6.383)] DLPAR_MEM (GB)        = 70
[ROHALOG:6357464:(6.403)] DLPAR_PROC_UNITS      = 0.4000
[ROHALOG:6357464:(6.414)] DLPAR_PROCS           = 4
[ROHALOG:6357464:(6.424)] CODPOOL_MEM           = 0
[ROHALOG:6357464:(6.434)] CODPOOL_CPU           = 0
[ROHALOG:6357464:(6.443)] ONOFF_MEM             = 0
[ROHALOG:6357464:(6.463)] ONOFF_CPU             = 0
[ROHALOG:6357464:(6.473)] EFFECTIVE_HMC_VERSION = 0
[ROHALOG:6357464:(6.483)] MAX_SPP_DIFF          = 0
[ROHALOG:6357464:(6.488)] ============================
[ROHALOG:6357464:(6.492)] +------------------+---------------------------------+
[ROHALOG:6357464:(6.496)] | Session info     |   Value                         |
[ROHALOG:6357464:(6.501)] +------------------+---------------------------------+
[ROHALOG:6357464:(6.505)] | Operation        |                       acquire   |
[ROHALOG:6357464:(6.510)] | Compute only     |                             0   |
[ROHALOG:6357464:(6.514)] | SystemMirror mode|                             1   |
[ROHALOG:6357464:(6.518)] | Synchronous      |                             1   |
[ROHALOG:6357464:(6.523)] | Handled Apps     | cl_app1_script
[ROHALOG:6357464:(6.527)] | Running Apps     | db_start_stop_script
[ROHALOG:6357464:(6.532)] +------------------+---------------------------------+
[ROHALOG:6357464:(8.167)] +------------------+----------------+
[ROHALOG:6357464:(8.171)] | HMC              |     Version    |
[ROHALOG:6357464:(8.177)] +------------------+----------------+
[ROHALOG:6357464:(8.182)] | HMC1_A |   VXXRX.XXXX.X |
[ROHALOG:6357464:(8.186)] +------------------+----------------+
[ROHALOG:6357464:(13.623)] +------------------+----------------+----------------+
[ROHALOG:6357464:(13.630)] | LPAR (shared)    |   Memory (GB)  |   PU(s)/VP(s)  |
[ROHALOG:6357464:(13.633)] +------------------+----------------+----------------+
[ROHALOG:6357464:(13.639)] | Name             |        lpar_node01   |
[ROHALOG:6357464:(13.644)] | State            |                       Running   |
[ROHALOG:6357464:(13.650)] | Id               |                            13   |
[ROHALOG:6357464:(13.653)] | Uuid             |   XXXXXXXXXXXXXXXXXXXXXXXXXXX   |
[ROHALOG:6357464:(13.659)] | VP/PU Ratio      |                       0.0500    |
[ROHALOG:6357464:(13.663)] | Minimum          |       6.0000   |    0.1000/  1   |
[ROHALOG:6357464:(13.669)] | Desired          |       6.0000   |    0.1000/  1   |
[ROHALOG:6357464:(13.673)] | Assigned         |      76.0000   |    0.5000/  5   |
[ROHALOG:6357464:(13.679)] | Maximum          |     120.0000   |    10.0000/ 10   |
[ROHALOG:6357464:(13.683)] +------------------+----------------+----------------+
[ROHALOG:6357464:(13.688)] +------------------+----------------+----------------+
[ROHALOG:6357464:(13.693)] | SHARED PROC POOL |                |      PU(s)     |
[ROHALOG:6357464:(13.697)] +------------------+----------------+----------------+
[ROHALOG:6357464:(13.702)] | Name             |                |        TEST |
[ROHALOG:6357464:(13.706)] | Maximum          |                |        5       |
[ROHALOG:6357464:(13.711)] | Free             |                |      4.1000     |
[ROHALOG:6357464:(13.716)] | Reserved         |                |      0.0000     |
[ROHALOG:6357464:(13.721)] +------------------+----------------+----------------+
[ROHALOG:6357464:(13.736)] +------------------+----------------+----------------+----------------+----------------+
[ROHALOG:6357464:(13.741)] | OPTIMAL APPS     |  Use Desired   |   Memory (GB)  |     CPU(s)     |   PU(s)/VP(s)  |
[ROHALOG:6357464:(13.746)] +------------------+----------------+----------------+----------------+----------------+
[ROHALOG:6357464:(13.751)] | cl_app1_script   |        0       |      10.0000   |        0       |    0.1000/  1   |
[ROHALOG:6357464:(13.756)] +------------------+----------------+----------------+----------------+----------------+
[ROHALOG:6357464:(13.761)] | Total            |        0       |      10.0000   |        0       |    0.1000/  1   |
[ROHALOG:6357464:(13.763)] +------------------+----------------+----------------+----------------+----------------+
[ROHALOG:6357464:(13.794)] +------------------+----------------+----------------+----------------+----------------+
[ROHALOG:6357464:(13.802)] | RUNNING APPS     |  Use Desired   |   Memory (GB)  |     CPU(s)     |   PU(s)/VP(s)  |
[ROHALOG:6357464:(13.807)] +------------------+----------------+----------------+----------------+----------------+
[ROHALOG:6357464:(13.812)] | db_start_stop_sc |        0       |      70.0000   |        0       |    0.4000/  4   |
[ROHALOG:6357464:(13.818)] +------------------+----------------+----------------+----------------+----------------+
[ROHALOG:6357464:(13.823)] | Total            |        0       |      70.0000   |        0       |    0.4000/  4   |
[ROHALOG:6357464:(13.828)] +------------------+----------------+----------------+----------------+----------------+
[ROHALOG:6357464:(13.834)] ============ Compute ROHA Memory ============
[ROHALOG:6357464:(13.838)] == Status Reminder ==
[ROHALOG:6357464:(13.843)] Current resources       :  76.0000 GB
[ROHALOG:6357464:(13.847)] == Raw computation from AC settings ==
[ROHALOG:6357464:(13.852)] LPAR profile minimum    :  6.0000 GB
[ROHALOG:6357464:(13.856)] APPs running            :  70.0000 GB
[ROHALOG:6357464:(13.861)] APPs to start (optimal) :  10.0000 GB
[ROHALOG:6357464:(13.865)] Total raw               :  86.0000 GB
[ROHALOG:6357464:(13.870)] Delta acquisition raw   :  10.0000 GB
[ROHALOG:6357464:(13.874)] == Maximum limits (fit into if exceeding and always_start_rg=1, otherwise fail).
[ROHALOG:6357464:(13.878)] LPAR profile maximum    :  120.0000 GB
[ROHALOG:6357464:(13.883)] == Minimum limits (adjust if under) ==
[ROHALOG:6357464:(13.887)] Delta minimum           :    0.00 GB
[ROHALOG:6357464:(13.892)] == Final computation, limits considered ==
[ROHALOG:6357464:(13.896)] Total                   :  86.0000 GB
[ROHALOG:6357464:(13.901)] Delta acquisition       :  10.0000 GB
[ROHALOG:6357464:(13.905)] =================== End =====================
[ROHALOG:6357464:(13.935)] ========== Compute ROHA PU(s)/VP(s) =========
[ROHALOG:6357464:(13.940)] == Status Reminder ==
[ROHALOG:6357464:(13.944)] Current resources       :  0.5000 /   5
[ROHALOG:6357464:(13.954)] == Raw computation from AC settings ==
[ROHALOG:6357464:(13.959)] LPAR profile minimum    :  0.1000 /   1
[ROHALOG:6357464:(13.963)] APPs running            :  0.4000 /   4
[ROHALOG:6357464:(13.968)] APPs to start (optimal) :  0.1000 /   1
[ROHALOG:6357464:(13.972)] Total raw               :  0.6000 /   6
[ROHALOG:6357464:(13.977)] Delta acquisition raw   :  0.1000 /   1
[ROHALOG:6357464:(13.981)] PU/VP ratio raw         :  0.1000
[ROHALOG:6357464:(13.986)] Minimal SPP size raw    :       1
[ROHALOG:6357464:(13.990)] == Maximum limits (fit into if exceeding and always_start_rg=1, otherwise fail).
[ROHALOG:6357464:(13.995)] LPAR profile maximum    :  10.0000 /  10
[ROHALOG:6357464:(13.999)] PU/VP ratio             :  0.0500 (Cannot have a lower ratio).
[ROHALOG:6357464:(14.009)] SPP free (for delta)    :       4
[ROHALOG:6357464:(14.014)] == Minimum limits (adjust if under) ==
[ROHALOG:6357464:(14.018)] Delta minimum           :    0.00 /   0
[ROHALOG:6357464:(14.023)] == Final computation, limits considered ==
[ROHALOG:6357464:(14.027)] Total                   :  0.6000 /   6
[ROHALOG:6357464:(14.032)] Delta acquisition       :  0.1000 /   1
[ROHALOG:6357464:(14.036)] PU/VP ratio             :  0.1000
[ROHALOG:6357464:(14.078)] =================== End =====================
[ROHALOG:6357464:(16.274)] Waiting on async release to complete
[ROHALOG:6357464:(16.275)]
[ROHALOG:6357464:(16.285)] Coming out of wait loop as there is no async release in progress
[ROHALOG:6357464:(16.285)]
[ROHALOG:6357464:(22.603)] +------------------+----------------+----------------+
[ROHALOG:6357464:(22.609)] | MANAGED SYSTEM   |   Memory (GB)  |  Proc Unit(s)  |
[ROHALOG:6357464:(22.613)] +------------------+----------------+----------------+
[ROHALOG:6357464:(22.619)] | Name             |      MACHINE_01_NAME   |
[ROHALOG:6357464:(22.623)] | State            |                     Operating   |
[ROHALOG:6357464:(22.629)] | Region Size      |       0.2500   |        /       |
[ROHALOG:6357464:(22.633)] | VP/PU Ratio      |        /       |     0.0500     |
[ROHALOG:6357464:(22.639)] | Installed        |    1280.0000   |     20.0000     |
[ROHALOG:6357464:(22.643)] | Configurable     |    1280.0000   |     20.0000     |
[ROHALOG:6357464:(22.648)] | Deconfigured     |       0.0000   |     0.0000     |
[ROHALOG:6357464:(22.652)] | Reserved         |      18.7500   |         /      |
[ROHALOG:6357464:(22.657)] | Available        |     945.2500   |     10.5000     |
[ROHALOG:6357464:(22.661)] | Free (computed)  |     945.2500   |     10.5000     |
[ROHALOG:6357464:(22.666)] +------------------+----------------+----------------+
[ROHALOG:6357464:(24.408)] +------------------+----------------+----------------+
[ROHALOG:6357464:(24.412)] | ENTERPRISE POOL  |   Memory (GB)  |     CPU(s)     |
[ROHALOG:6357464:(24.417)] +------------------+----------------+----------------+
[ROHALOG:6357464:(24.421)] | Name             |                       unknown   |
[ROHALOG:6357464:(24.425)] | State            |                       unknown   |
[ROHALOG:6357464:(24.430)] | Master HMC       |                       unknown   |
[ROHALOG:6357464:(24.433)] | Capacity (total) |       0.0000   |        0       |
[ROHALOG:6357464:(24.440)] | Available        |       0.0000   |        0       |
[ROHALOG:6357464:(24.443)] | Unreturned       |       0.0000   |        0       |
[ROHALOG:6357464:(24.449)] | Unreturned (MS)  |       0.0000   |        0       |
[ROHALOG:6357464:(24.453)] | Mobile (MS)      |       0.0000   |        0       |
[ROHALOG:6357464:(24.460)] +------------------+----------------+----------------+
[ROHALOG:6357464:(24.463)] +------------------+-----------------+----------------+----------------+
[ROHALOG:6357464:(24.470)] | ENTERPRISE POOL  | MANAGED_SYSTEM  |   Memory (GB)  |     CPU(s)     |
[ROHALOG:6357464:(24.473)] +------------------+-----------------+----------------+----------------+
[ROHALOG:6357464:(24.480)] | Name             |                                         unknown   |
[ROHALOG:6357464:(25.991)] +------------------+----------------+----------------+
[ROHALOG:6357464:(25.995)] | TRIAL COD        |   Memory (GB)  |     CPU(s)     |
[ROHALOG:6357464:(26.000)] +------------------+----------------+----------------+
[ROHALOG:6357464:(26.002)] | State            |    Not Running |    Not Running |
[ROHALOG:6357464:(26.010)] | Activated        |       0.0000   |        0       |
[ROHALOG:6357464:(26.012)] | Days left        |         0      |        0       |
[ROHALOG:6357464:(26.019)] | Hours left       |         0      |        0       |
[ROHALOG:6357464:(26.022)] +------------------+----------------+----------------+
[ROHALOG:6357464:(27.626)] +------------------+----------------+----------------+
[ROHALOG:6357464:(27.632)] | ONOFF COD        |   Memory (GB)  |     CPU(s)     |
[ROHALOG:6357464:(27.637)] +------------------+----------------+----------------+
[ROHALOG:6357464:(27.641)] | State            |        unknown |        unknown |
[ROHALOG:6357464:(27.646)] | Activated        |       0.0000   |        0       |
[ROHALOG:6357464:(27.650)] | Unreturned       |       0.0000   |        0       |
[ROHALOG:6357464:(27.652)] | Available        |       0.0000   |        0       |
[ROHALOG:6357464:(27.660)] | Days available   |         0      |        0       |
[ROHALOG:6357464:(27.662)] | Days left        |         0      |        0       |
[ROHALOG:6357464:(27.670)] | Hours left       |         0      |        0       |
[ROHALOG:6357464:(27.672)] +------------------+----------------+----------------+
[ROHALOG:6357464:(34.972)] +------------------+----------------+----------------+
[ROHALOG:6357464:(34.978)] | OTHER            |   Memory (GB)  |   PU(s)/VP(s)  |
[ROHALOG:6357464:(34.982)] +------------------+----------------+----------------+
[ROHALOG:6357464:(34.987)] | LPAR (shared)    |        lpar_node_02   |
[ROHALOG:6357464:(34.991)] | State            |                       Running   |
[ROHALOG:6357464:(34.996)] | Id               |                            12   |
[ROHALOG:6357464:(35.000)] | Uuid             |   XXXXXXXXXXXXXXXXXXXXXXXXXXX   |
[ROHALOG:6357464:(35.006)] | Minimum          |       6.0000   |    0.1000/  1   |
[ROHALOG:6357464:(35.010)] | Assigned         |      30.0000   |    0.1000/  1   |
[ROHALOG:6357464:(35.015)] +------------------+----------------+----------------+
[ROHALOG:6357464:(35.020)] | MANAGED SYSTEM   |      MACHINE_01_NAME   |
[ROHALOG:6357464:(35.025)] | State            |                     Operating   |
[ROHALOG:6357464:(35.030)] +------------------+----------------+----------------+
[ROHALOG:6357464:(35.032)] | ENTERPRISE POOL  |                       unknown   |
[ROHALOG:6357464:(35.040)] | Mobile (MS)      |       0.0000   |        0       |
[ROHALOG:6357464:(35.042)] +------------------+----------------+----------------+
[ROHALOG:6357464:(35.063)] =========== Identify ROHA Memory ===========
[ROHALOG:6357464:(35.068)] Remaining available memory for partition:            945.2500 GB
[ROHALOG:6357464:(35.072)] Total Enterprise Pool memory to yank normal from
[ROHALOG:6357464:(35.077)] XXX_PHYSICAL_MACHINE:    0.0000 GB
[ROHALOG:6357464:(35.082)] Total Enterprise Pool memory to acquire including all yanked amounts:        0.0000 GB
[ROHALOG:6357464:(35.086)] Total On/Off CoD memory to activate:                 0.0000 GB for 0 days
[ROHALOG:6357464:(35.091)] Total DLPAR memory to allocate:                      10.0000 GB
[ROHALOG:6357464:(35.095)] =================== End ====================
[ROHALOG:6357464:(35.112)] ========== Identify ROHA Processor ===========
[ROHALOG:6357464:(35.117)] Remaining available PU(s) for partition:             10.5000 Processing Unit(s)
[ROHALOG:6357464:(35.121)] Total Enterprise Pool CPU(s) to yank normal from
[ROHALOG:6357464:(35.126)] MACHINE_01_NAME:    0.0000 CPU(s)
[ROHALOG:6357464:(35.131)] Total Enterprise Pool CPU(s) to acquire including all yanked amounts:        0.0000 CPU(s)
[ROHALOG:6357464:(35.135)] Total On/Off CoD CPU(s) to activate:                 0.0000 CPU(s) for 0 days
[ROHALOG:6357464:(35.140)] Total DLPAR PU(s)/VP(s) to allocate:                 0.1000 Processing Unit(s) and 1.0000 Virtual Processor(s)
[ROHALOG:6357464:(35.144)] =================== End =====================
[ROHALOG:6357464:(44.963)] clhmccmd: 10.0000 GB of DLPAR resources have been allocated.
[ROHALOG:6357464:(44.963)] clhmccmd: 1 VP(s) or CPU(s) and 0.1000 PU(s) of DLPAR resources have been allocated.
[ROHALOG:6357464:(45.134)] The following resources were acquired for application controllers cl_app1_script.
[ROHALOG:6357464:(45.134)] DLPAR memory: 10.0000 GB             On/Off CoD memory: 0.0000 GB            Enterprise Pool memory: 0.0000 GB.
[ROHALOG:6357464:(45.134)] DLPAR processor: 0.1000 PU/1.0000 VP On/Off CoD processor: 0.0000 CPU(s)     Enterprise Pool processor: 0.0000 CPU(s)
[ROHALOG:6357464:(45.139)] INFO: received rc=0.
[ROHALOG:6357464:(45.144)] Success on 1 attempt(s).
[ROHALOG:6357464:(45.271)] ===== HACMPdynresop ODM ====
[ROHALOG:6357464:(45.281)] TIMESTAMP             = Thu Jun 1 14:04:03 CEST 2023
[ROHALOG:6357464:(45.298)] MANAGED_SYSTEM        = 0
[ROHALOG:6357464:(45.308)] ENTERPRISE_POOL       = 0
[ROHALOG:6357464:(45.321)] PREFERRED_HMC_LIST    = 0
[ROHALOG:6357464:(45.337)] PREFERRED_NOVA_LIST   = 0
[ROHALOG:6357464:(45.352)] DLPAR_MEM (GB)        = 80
[ROHALOG:6357464:(45.364)] DLPAR_PROC_UNITS      = 0.5
[ROHALOG:6357464:(45.375)] DLPAR_PROCS           = 5
[ROHALOG:6357464:(45.390)] CODPOOL_MEM           = 0
[ROHALOG:6357464:(45.401)] CODPOOL_CPU           = 0
[ROHALOG:6357464:(45.411)] ONOFF_MEM             = 0
[ROHALOG:6357464:(45.422)] ONOFF_CPU             = 0
[ROHALOG:6357464:(45.459)] EFFECTIVE_HMC_VERSION = 0
[ROHALOG:6357464:(45.470)] MAX_SPP_DIFF          = 0
[ROHALOG:6357464:(45.481)] ============================
[ROHALOG:6357464:(45.491)] Close session 6357464 at Thu Jun  1 14:04:04 CEST 2023

Configuration

First and foremost, refer to the IBM documentation: https://www.ibm.com/docs/en/powerha-aix/7.2?topic=availability-configuring-resource-optimized-high

HMC consoles

Since ROHA connects to the HMC console, it is necessary to use a user with appropriate permissions on the console.

It is possible to use a REST API or SSH connection. To add the HMC console to the cluster, execute the following command or do it via SMIT:

NOTE: Configuring via SMIT does not allow you to specify a user account. By default, the hscroot account is used.

# clmgr add hmc <HMC> USER_NAME=<hmcuser>

When using an SSH connection, it is advisable to confirm the fingerprint first, for example, in the following way, on all cluster nodes.

# ssh hmc1
The authenticity of host 'hmc1 (XX.XX.XX.XX)' can't be established.
ECDSA key fingerprint is SHA256:XXXXXXXXXXXXXXXXXXXXXXX.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

# ssh hmc2
The authenticity of host 'hmc2 (XX.XX.XX.XX)' can't be established.
ECDSA key fingerprint is SHA256:XXXXXXXXXXXXXXXXXXXXXXXXXX.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

In the ROHA settings, assign/verify the list of consoles assigned to manage a given node and Site.

# smitty sysmirror - Cluster Applications and Resources → Resources → Configure User Applications (Scripts and Monitors) →  Resource Optimized High Availability →  HMC Configuration

  • Change/Show HMC List for a Node
  • Change/Show HMC List for a Site
  • Change/Show Default HMC List

Application Controller Settings

The Resource Group must have the attribute Application Controller Name set to the appropriate Application Controller. If none exists, it must be created.

# smitty sysmirror - Cluster Applications and Resources → Resources → Configure User Applications (Scripts and Monitors) →  Resource Optimized High Availability → Hardware Resource Provisioning for Application Controller → Add Hardware Resource Provisioning to an Application Controller

                                                      [Entry Fields]
Application Controller Name                         cl_app1_script

Use desired level from the LPAR profile             No                
Optimal amount of gigabytes of memory              [10.0]

Optimal number of dedicated processors             [0]                

Optimal number of processing units                 [0.10]
Optimal number of virtual processors               [1]         


Note: If you do not want to assign a value to any field, do not leave it empty but enter "0" (zero). If the field is left empty, the settings will not be overwritten, and the previous value will remain active.

Summary

By understanding the principles and operational details of ROHA, administrators can better manage high availability environments, reduce downtime, and optimize resource usage, ultimately contributing to more efficient and reliable system performance.
 
I hope this article has provided valuable insights into ROHA and its implementation. If you have any feedback or notice any errors, please feel free to reach out to me via LinkedIn --> https://www.linkedin.com/in/michal-wiktorek-83b2b47b/
0 comments
40 views

Permalink