Turbonomic

 View Only

Enabling and Implementing Turbo-on-Turbo Actions in Red Hat® OpenShift®

By Tim Sweetz posted Thu February 22, 2024 01:11 PM

  

In the world of Application Resource Management, IBM Turbonomic is the premier software solution that automatically determines the right resource allocation actions to help ensure your Kubernetes environments and mission-critical applications get exactly what they need when they need it to meet your SLOs.  Did you also know that you can utilize Turbonomic to optimize itself?  We call this Turbo-on-Turbo but in order to do this in Red Hat OpenShift Container Platform deployments, there are several steps we need to take to enable this and set some guardrails to ensure your Turbonomic deployment remains highly available.

In this post, we will detail how to allow a user to execute scaling actions on a self-managed Turbonomic Server in Red Hat OpenShift to mitigate performance issues due to resource constraints, and at the same time identify and act on efficiency opportunities.  The user can also leverage Pod Moves to better utilize cluster resources under fluctuating demand, and mitigate performance issues due to node pressure and confidently run more efficiently!

PLEASE NOTE - this procedure only applies to Turbonomic deployments in Red Hat OpenShift, for Turbonomic SaaS this is already enabled and for Turbonomic OVA deployments, refer to the Turbonomic Documentation.

In this post we will describe how to:

Before we get into the specifics, please ensure the following prerequisites are satisfied.

  • Kubeturbo deployed to your k8s cluster (v1.21 and higher) where the Turbonomic Server is deployed
  • Kubeturbo is deployed separately and not via the Turbonomic Server XL Operator (see Additional Notes below)
  • Turbonomic Server has a full set of managed targets, and we have collected a reasonable amount of history
  • Turbonomic User has Site Admin or Admin role privileges to create groups and policies (see Additional Notes below)

Additional Notes:

  • Deploy Kubeturbo on its own, and not through the Turbonomic Server XL custom resource - see the following page for more details 
  • For more details on working with Groups and Policies, refer to the following IBM Turbonomic Documentation pages

  • The following steps are to AUTOMATE Turbonomic actions on Turbonomic components.  It is recommended to start with MANUAL to verify successful execution of actions, then switch policies from MANUAL to AUTOMATIC.
  • This was written for Turbonomic versions 8.11.1 and prior – beginning in 8.11.2, there is a change to automation policies to allow for more granular control of container resize actions – click here and refer to the Granular Resize Actions for Workload Controllers in the ‘What’s New’ section for more details.

Before we go any further and start getting into the details, I wanted to give kudos to @Eva Tuczai, @Justus Gries and @Marc Beckert for their contributions and assistance in putting this all together!

Enable Turbonomic Workload Resize Actions

We will now leverage Turbonomic’s analysis of resource usage to apply Limit and Request rightsizing decisions to the Turbonomic microservices.  Note that by default, most Turbonomic components only specify Mem Limits and Mem Requests, so in a default configuration deployment you will not see CPU Limit and CPU Request resize actions.  If your environment uses other mechanisms to inject resource configurations, whether they are specified via CR (to support Quotas) or limitrange, then any spec defined can be optimized.

Rightsizing assures that we mitigate OOMs and performance issues, as well as reduce limits and requests that are not needed.  Resource demands on Turbonomic components are related to the number of objects under Turbonomic management.  You may find that you need to resize more than once as your managed environment changes.0

For more details on Turbonomic Rightsizing, please read the following article: https://github.com/turbonomic/kubeturbo/wiki/Action-Details#resizing-vertical-scaling-of-containerized-workloads

Deploy Turbonomic Operator Resource Map (ORM)

Turbonomic Server components are managed by the t8c-operator, therefore we cannot directly modify workload limits and requests.  We need to instruct Kubeturbo with a map of how to make these changes via the Turbonomic Custom Resource (aka the XL CR).  This map is a custom resource called Operator Resource Map (ORM) and Turbonomic has one ready for you to use for the Turbonomic Server components.

  1. For Turbonomic Server deployments running on k8s versions of 1.16 or higher, create this ORM CRD:
    kubectl create -f
    https://raw.githubusercontent.com/turbonomic/orm/master/config/crd/bases/devops.turbonomic.io_operatorresourcemappings.yaml
  2. Deploy the ORM Custom Resource configured for the Turbonomic Operator in the same namespace as the Turbonomic Server
    kubectl -n turbonomic apply -f  
    https://raw.githubusercontent.com/turbonomic/orm/master/library/ibm/turbo_operator_resource_mapping_sample_cr.yaml
  3. Restart the KubeTurbo pod to pick up this ORM
    kubectl -n {namespace} delete pod kubeturbo-{pod}

Modify KubeTurbo Custom Resource Exclusion List

After you have deployed the ORM CRD, the next step is to modify the KubeTurbo CRD to allow actions to execute.  Starting in 8.10.6 KubeTurbo will auto-create Container Spec groups for workloads controlled by an Operator by default and create a new default policy that sets those resize actions to recommend only so they cannot be automated. If you want to allow specific workloads that are controlled by an Operator to execute resize actions when an ORM is deployed, you can add these workloads to an exclusion list that is defined in a configmap.  Please refer to and follow the steps in the following wiki page regarding operator namespace exclusion so that all workloads in the namespace where the Turbonomic Server components are running will be allowed to execute: https://github.com/turbonomic/kubeturbo/wiki/Actions-and-Handling-Special-Cases#operator-controlled-workloads-recommend-only

Create Turbonomic Workload Controller Groups

Given that pod resizes also restart components, the following pods should only be resized during low user activity times (after hours during a scheduled execution window). 

  1. Topology Processor
  2. Market
  3. History
  4. DB (if applicable)
  5. Timescale DB (if running embedded reporting)

To setup pod scaling properly, we need to create two groups: 1) pods that can resize any time, and 2) pods that should only resize during a scheduled execution window to avoid user downtime.

Group 1 (turbo-workloads-no-schedule) setup instructions:

  1. Navigate to Settings>Groups then click ‘New Group’
  2. Select ‘Workload Controller from the list of group types
  3. Give the group a name, such as ‘turbo-workloads-no-schedule
  4. Click ‘Add Filter’ and select ‘Container Platform Cluster
  5. Click the drop-down box and find and check the Turbonomic cluster, by default it will be ‘Kubernetes-Turbonomic’ unless you named it something else
  6. Next, click ‘Add Filter’ again and select ‘Namespace’.
  7. Click the drop-down box and find and check the turbonomic namespace (unless you named it something unique).
  8. Click ‘Add Filter’ again and this time select ‘Name’
  9. Change the first drop-down from ‘equals’ to ‘not equals’, click the RegEx box and then enter the following RegEx into the second field -  ^(topology-processor|market|history|db|timescale)$
  10. When finished, you should see all of the Turbonomic pods EXCEPT the topology-processor, market, history, db (if applicable) and timescale (if applicable) – and it should look similar to this (assuming it does, click ‘Save Group’ to save this new group):

************************************************************************************************

Group 2 (turbo-workloads-with-schedule) setup instructions:

  1. Navigate to Settings>Groups then click ‘New Group’.
  2. Select ‘Workload Controller from the list of group types.
  3. Give the group a name, such as ‘turbo-workloads-with-schedule’.
  4. Click ‘Add Filter’ and select ‘Container Platform Cluster’.
  5. Click the drop-down box and find and check the Turbonomic cluster, by default it will be ‘Kubernetes-Turbonomic’ unless you named it something else.
  6. Next, click ‘Add Filter’ again and select ‘Namespace’.
  7. Click the drop-down box and find and check the turbonomic namespace (unless you named it something unique).
  8. Click ‘Add Filter’ again and this time select ‘Name’.
  9. Leave the first drop-down set to ‘equals’, click the RegEx box and then enter the following RegEx into the second field -  ^(topology-processor|market|history|^db|timescale)$
  10. When finished, you should ONLY see the applicable Turbonomic pods (topology-processor, market, history, db and timescale) in this group – it should look similar to this (assuming it does, click ‘Save Group’ to save this new group):

Create Turbonomic Workload Controller Automation Policies

Next, in order to ensure the 5 Turbonomic components we have identified only get resized during an approved execution window, we now need to create two automation policies.

Scheduled Automation Policy:

  1. Navigate to Settings>Policies and then click ‘New Policy’.
  2. For policy type, select ‘Automation Policy’, and then for entity type, select ‘Workload Controller’.
  3. Give the policy a name, such as ‘turbo-pod-resize-with-schedule’.
  4. Under Scope, click ‘Select Group of Workload Controllers’, search for the group previously created for the Pods that should only be moved during a scheduled window (turbo-workloads-with-schedule was the group we created earlier) and click the checkbox then click ‘Select’.
  5. Expand the ‘Automation and Orchestration’ section then click ‘Add Action’.
  6. Click in the ‘Action Type’ box and click on the ‘Resize’ action type.
  7. Under ‘Action Acceptance’, change the value from ‘Manual’ to ‘Automatic’ to enable these actions to execute automatically during the approved window.
  8. Finally, at the bottom under ‘Execution Schedule’, click ‘Add Schedule’.
  9. You can either select a schedule you have created or create a new schedule.  The ideal time window should be when you do not expect Users to log into Turbonomic, as these resize actions will cause the application to restart, which can take anywhere from 10 – 30+ minutes.
  10. Click ‘Submit’ to save the schedule, then click the radio button to attach the schedule to the policy.
  11. The Automation and Orchestration page should then look similar to this (click Submit to save it).
  12. This will take you back to the Policy, ensure it looks similar to the following screenshot then click ‘Save Policy’ to save and enable this scheduled resizing.

No Schedule Needed Automation Policy:

We now need to create an automation policy to enable automatic execution of Pod resizes for the rest of the Turbonomic components (without a schedule).
Note: you can choose to execute these components in a schedule or maintenance as well if you choose to.

  1. Navigate to Settings>Policies and then click ‘New Policy’.
  2. For policy type, select ‘Automation Policy’, and then for entity type, select ‘Workload Controller’.
  3. Give the policy a name, such as ‘turbo-pod-resize-no-schedule’.
  4. Under Scope, click ‘Select Group of Workload Controllers’, search for the group previously created for the Pods that do not necessarily need to be moved during a scheduled window (turbo-workloads-no-schedule was the group we created earlier) and click the checkbox then click ‘Select’.
  5. Expand the ‘Automation and Orchestration’ section then click ‘Add Action’.
  6. Click in the ‘Action Type’ box and click on the ‘Resize’ action type.
  7. Under ‘Action Acceptance’, change the value from ‘Manual’ to ‘Automatic’ to enable these actions to execute automatically.  Click ‘Submit’ to save.
  8. This will take you back to the Policy, ensure it looks similar to the following screenshot then click ‘Save Policy’ to save and enable this workload controller policy.

Enable Turbonomic Pod Moves

Pod Moves proactively alleviate node congestion without relying on reactive pod evictions.  At the same time, Turbonomic also identifies how to safely manage cluster capacity, whether it is wasted resources in unnecessary nodes that can be safely removed, or proactively projecting when new nodes are needed due to increasing resources.  Pod Moves uniquely provide proactive, preventative, and dynamic resource management to Kubernetes. The following sections will detail changes we need to make to configure Kubeturbo, groups we need to make and then finally the automation policies we need to create leveraging the groups we just created.

For more on Pod Moves, read the following article: https://github.com/turbonomic/kubeturbo/wiki/Action-Details#turbonomic-pod-moves-continuous-rescheduling

Pod Moves are beneficial in a multi-node Kubernetes cluster.  If you are running Turbonomic in the VM appliance configuration, there is only a single node k8s cluster, therefore no need to set up Pod Moves...........

Configure Kubeturbo: Pods with PVs

Some of the Pods we can automate moves for use Persistent Volumes (PVs).  We will want to enable the Pod-With-PV move option.  Check out the following page for complete instructions: https://github.com/turbonomic/kubeturbo/wiki/Action-Details#pods-with-pvs

Pods with PVs that are RWO will need to have an alternative mechanism to be able to relocate the Pod onto another compliant node, since 2 copies of the pod will not be able to attach the same PV at the same time. Follow the instructions based on your Kubeturbo deployment method:.

1.     straight yamls - modify the deployment

spec:

  template:

    spec:

      containers:

        args:

          - --fail-volume-pod-moves=false

2.     heml chart - provide this parameter - set args.failVolumePodMoves=false

3.     operator - edit and add to the kubeturbo-release CR

spec:

  args:

    failVolumePodMoves: 'false'

Configure Kubeturbo: OpenShift SCC context

In OpenShift we need to handle SCC in pod moves.  If you have deployed Kubeturbo via the OpenShift Operator Hub, then this step is already done.  For other deployment methods follow the instructions here: https://github.com/turbonomic/kubeturbo/wiki/Action-Details#openshift-environments

1.     straight yamls - modify the deployment

spec:

  template:

    spec:

      containers:

        args:

          - --sccsupport:*

2.     heml chart - provide this parameter set args.sccsupport=*

3.     operator - edit and add to the kubeturbo-release CR

spec:

  args:

   sccsupport: '*'

Create Turbo Pod Groups for Pod Moves

1.     Given that pod moves also restart components, the following list represent components that would require time (anywhere from 5-30 minutes) for the application to be ready for the user.  To avoid potential downtime for the Turbonomic application, these components should be in a group that would disable Pod Move actions from generating, allowing Turbonomic analysis to move other more stateless pods.

  1. Topology Processor
  2. Market
  3. History
  4. DB (if not using your own external DB)
  5. Timescale DB (if running embedded reporting)

2.     If you are running Turbonomic in a dedicated set of nodes, then you can scope these actions to the node group Turbonomic is using.  If Turbonomic is in a shared node group with other applications, the analysis will consider all Pods to move except DaemonSets.  You will want to automate Pod moves for all Pods to get full benefit.

3.     To setup pod moves properly, we need to create two groups: 1) pods that can move any time, and 2) pods that should have move actions disabled. 

Group 1 (turbo-pods-automated-moves) setup instructions:

  1.  Navigate to Settings>Groups then click ‘New Group’.
  2.  Select ‘Container Pod’ from the list of group types.
  3.  Give the group a name, such as ‘turbo-pods-automated-moves’.
  4.  Click ‘Add Filter’ and select ‘Container Platform Cluster’.
  5.  Click the drop-down box and find and check the Turbonomic cluster, by default it will be ‘Kubernetes-Turbonomic’ unless you named it something else.
  6.  Next, click ‘Add Filter’ again and select ‘Namespace’.
  7.  Click the drop-down box and find and check the turbonomic namespace (unless you named it something unique).
  8.  Click ‘Add Filter’ again and this time select ‘Name’.
  9.  Change the first drop-down from ‘equals’ to ‘not equals’, click the RegEx box and then enter the following RegEx into the second field -  ^(.*topology-processor-.*|.*market-.*|.*history-.*|.*timescale-.*|.*db-.*)$
  10.  If you wish to scope down to pods on a specific set of nodes, you can leverage the filter of ‘Virtual Machine Tags’ which will allow you to set a filter on k8s node LABELS which can represent your agent / node pool          
  11. When finished, you should see a dynamic list of pods that meet your filter criteria and should look similar to this.  Click 'Save Group’ to save this new group:  
Group 2 (turbo-pods-disable-moves) setup instructions:
  1. Navigate to Settings>Groups then click ‘New Group’.
  2. Select ‘Container Pod’ from the list of group types.
  3. Give the group a name, such as ‘turbo-pods-disable-moves’.
  4. Click ‘Add Filter’ and select ‘Container Platform Cluster’.
  5. Click the drop-down box and find and check the Turbonomic cluster, by default it will be ‘Kubernetes-Turbonomic’ unless you named it something else.
  6. Next, click ‘Add Filter’ again and select ‘Namespace’.
  7. Click the drop-down box and find and check the turbonomic namespace (unless you named it something unique).
  8. Click ‘Add Filter’ again and this time select ‘Name’.
  9.  Ensure the drop-down is set to ‘equals’ and then enter the following RegEx into the second field -  ^(.*topology-processor-.*|.*market-.*|.*history-.*|.*timescale-.*|.*db-.*)$
  10. When finished, you should see only the Turbonomic pods that represent components we do not want to generate move actions on: topology-processor, market, history,  db, and timescale db (if applicable) – and it should look similar to this (assuming it does, click ‘Save Group’ to save this new group): 

Create Turbonomic Automation Policies

We will now create policies to define how move actions get generated and executed across the Groups we just created.

Disable Pod Moves Group:

1.     Create the Disable Pod Move automation policy and use the group of Turbonomic components that we want to restrict to not even be considered by the analysis to move.

  1. Navigate to Settings>Policies and then click ‘New Policy’.
  2. For policy type, select ‘Automation Policy’, and then for entity type, select ‘Container Pod’.
  3. Give the policy a name, such as ‘turbo-pod-moves-disabled’.
  4. Under Scope, click ‘Select Group of Container Pods’, search for the group previously created for the Pods that should not be considered for moves (turbo-pods-disable-moves was the group we created earlier) and click the checkbox then click ‘Select’.
  5. Expand the ‘Automation and Orchestration’ section then click ‘Add Action’.
  6. Click in the ‘Action Type’ box and click on the ‘Move’ action type.
  7. Under ‘Action Generation’, change the value from ‘Do not Generate Actions’.
  8. This will take you back to the Policy, ensure it looks similar to the following screenshot then click ‘Save Policy’ to save and enable this container pod policy. 

Automated Pod Move Group:

1.     We now need to create an automation policy to enable automatic execution of Pod moves for the rest of the Turbonomic components and other pods on the same nodes.

  1. Navigate to Settings>Policies and then click ‘New Policy’.
  2. For policy type, select ‘Automation Policy’, and then for entity type, select ‘Container Pod’.
  3. Give the policy a name, such as ‘turbo-pod-moves-automated’.
  4. Under Scope, click ‘Select Group of Container Pods’, search for the group previously created for the Pods that do not need to be moved during a scheduled window (turbo-pods-automated-moves was the group we created earlier) and click the checkbox then click ‘Select’.
  5. Expand the ‘Automation and Orchestration’ section then click ‘Add Action’.
  6. Click in the ‘Action Type’ box and click on the ‘Move’ action type.
  7. Under ‘Action Acceptance’, change the value from ‘Manual’ to ‘Automatic’ to enable these actions to execute automatically.  Click ‘Submit’ to save.
  8. This will take you back to the Policy, ensure it looks similar to the following screenshot then click ‘Save Policy’ to save and enable this container pod policy.

Validate Actions Complete Successfully as Designed

Now that the preceding steps have been completed, verify actions are completing as expected.  For the 5 Turbonomic pods (TP, Market, History, DB and Timescale), ensure the resizing actions only occur during the scheduled execution window and also ensure they are not executing Pod moves.  For the rest of the Pods, ensure pod moves and resizing actions are occurring as needed.  

If you chose to first setup the automation policies with manual execution for testing, after validating successful execution of actions you should now switch those policies to automatic execution.

Leverage the Executed Actions and Risk Avoided Widgets.  You can create this in a Custom Dashboard to track the optimization performed through Turbo-on-Turbo actions.

Appendix

If you find that resizing actions on some Turbonomic components fail even after deploying the ORM, you may have an ORM that needs an update.  A resize action that failed due to lack of ORM support will look like this:

There are 3 possible scenarios:

  1. The ORM you are using is out of date.
  2. The component cannot be modified.  Third-party components that Turbonomic uses are not always configured with this flexibility.  To verify this open an issue.
  3. Or the component is not included in the ORM deployed into the cluster.  In this case, you should also open an issue so the ORM can be updated.

Open an Issue:

Turbonomic will maintain the ORM for the Turbonomic Server, and suggest if there is not an updated ORM available, you first open an Issue on the public GitHub project for ORM: https://github.com/turbonomic/orm/issues

Apply a Newer ORM Version:

Go to the GitHub project for the ORM and the IBM Turbonomic XL ORM will be kept in this folder: https://github.com/turbonomic/orm/tree/master/library/ibm. 

If there is a newer version by last commit date, then download this version, go to the cluster and namespace that the Turbonomic Server is deployed into, and apply the newer ORM.  Restart Kubeturbo-release or Kubeturbo pod to pick up the changes.

0 comments
16 views

Permalink