AIOps

 View Only

Cloud Pak for AIOps 4 tips: ensure OpenShift has sufficient resources

By Zane Bray posted 20 days ago

  

OpenShift requires a certain amount of the hardware resource on its nodes to run the OpenShift platform itself, apart from any workloads. The amount of hardware resource OpenShift needs varies depending on the size of the node. An analogy is the engine in a car: the bigger the car, the bigger the engine that is required to propel it.

Red Hat publishes a method by which you can calculate how much hardware resource should be reserved for OpenShift on each node, depending on size. The means by which to calculate how much is needed is summarised below:

Memory

  • Nodes with 1GB of memory or less: reserve 255 MB for OpenShift
  • Nodes with more than 1GB of memory:
    • 25% of the first 4 GB of memory
    • 20% of the next 4 GB of memory (between 4 GB and 8 GB)
    • 10% of the next 8 GB of memory (between 8 GB and 16 GB)
    • 6% of the next 112 GB of memory (up to 128 GB)
    • 2% of the remaining memory

CPU

  • 6% of the first core
  • 1% of the second core
  • 0.5% of the next 2 cores
  • 0.25% of any remaining core

Reference: https://access.redhat.com/solutions/5843241

EXAMPLE

Barbara is planning an AIOps deployment and needs to calculate how much hardware resource to reserve for OpenShift. The AIOps worker node specification is: 16 core, 64 GB RAM, and 250 GB disk. She calculates therefore that OpenShift will need: 

(0.06 × 1) + (0.01 × 1) + (0.005 × 2) + (0.0025 × 12) = 110 millicores (0.11 cores)

(0.25 × 4) + (0.20 × 4) + (0.10 × 8) + (0.06 × 48) = 5.48 GB RAM

Barbara calculates that the amount of hardware resource on each node to run AIOps workloads is actually: 15.89 cores and 58.52 GB RAM. She bases her hardware sizing on these amounts therefore.

--

When planning an AIOps deployment, it is essential to ensure OpenShift has sufficient resources to run comfortably. As such, it is important to know how much of your worker nodes' hardware resources needs to be reserved for OpenShift's use and hence can't be used to run your AIOps workloads. What is left over is what should be used in any calculations to determine how many worker nodes are needed for a deployment.

What is not widely known, is that OpenShift by default reserves 500 millicores (0.5 cores) and just 1 GB RAM on both master and worker nodes for OpenShift's use. From the above example, you can see that this would be insufficient for most AIOps deployments where a 16/64 worker node specification is common. Clusters with under-resourced nodes typically present with instability and other performance related issues which can be difficult to diagnose. Anecdotally, a lack of resource at the OpenShift level has often been the root cause of such issues.

The answer then is to configure your master and worker nodes with sufficient hardware resource to run the OpenShift platform processes. This will ensure that your cluster nodes all have sufficient hardware resource to run smoothly. A smoothly running OpenShift platform translates into a smoothly running AIOps system.

Instructions for how to set this up are here: https://docs.openshift.com/container-platform/4.12/nodes/nodes/nodes-nodes-resources-configuring.html

NOTES

  • It is strongly recommended to simply set your nodes to automatically reserve the correct amount of hardware resource for OpenShift, based on node size. See the section entitled "Automatically allocating resources for nodes" in the link above for the steps to configure this.
  • The example for configuring automatic mode in the link above describes how to configure just the worker nodes. It is recommended to configure both master and worker nodes in this way to avoid unnecessary problems. A sample custom resource (CR) configuration for doing both is included below.
  • Even when using automatic mode, it is still essential to understand how much of your nodes' hardware resources will be reserved, so that you know how much remains, so that you can correctly size your AIOps deployment, taking into account additional contingency allowances.
  • Take care when reverting to automatic mode on an already running system as you may inadvertently find that you no longer have enough hardware resource to run AIOps. Do the calculations first to ensure you have enough for AIOps to run still and, add additional worker nodes if necessary, before reconfiguring OpenShift to automatically reserve hardware resources.
  • Typically OpenShift deployments for AIOps are sized to include additional worker nodes, so that the cluster can tolerate a failure or a node, or even simply for upgrade purposes. Assuming you have additional worker nodes present in your cluster, you can safely run the configuration steps outlined above in a live system. After making the configuration changes, OpenShift will restart each node one at a time, ensuring no loss in service.

Sample custom resource (CR) configuration file:

apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: dynamic-node 
spec:
  autoSizingReserved: true 
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ""
      pools.operator.machineconfiguration.openshift.io/master: ""
#...

Also see this page in the IBM SWAT Practitioner Basics for further notes and examples on this topic.

0 comments
23 views

Permalink