In the world of Application Resource Management, IBM Turbonomic is the premier software solution that automatically determines the right resource allocation actions to help ensure your Kubernetes environments and mission-critical applications get exactly what they need when they need it to meet your SLOs. Did you also know that you can utilize Turbonomic to optimize itself? We call this Turbo-on-Turbo but in order to do this in Red Hat OpenShift Container Platform deployments, there are several steps we need to take to enable this and set some guardrails to ensure your Turbonomic deployment remains highly available.
In this post, we will detail how to allow a user to execute scaling actions on a self-managed Turbonomic Server in Red Hat OpenShift to mitigate performance issues due to resource constraints, and at the same time identify and act on efficiency opportunities. The user can also leverage Pod Moves to better utilize cluster resources under fluctuating demand, and mitigate performance issues due to node pressure and confidently run more efficiently!
PLEASE NOTE - this procedure only applies to Turbonomic deployments in Red Hat OpenShift, for Turbonomic SaaS this is already enabled and for Turbonomic OVA deployments, refer to the Turbonomic Documentation.
In this post we will describe how to:
Before we get into the specifics, please ensure the following prerequisites are satisfied.
- Kubeturbo deployed to your k8s cluster (v1.21 and higher) where the Turbonomic Server is deployed
- Kubeturbo is deployed separately and not via the Turbonomic Server XL Operator (see Additional Notes below)
- Turbonomic Server has a full set of managed targets, and we have collected a reasonable amount of history
- Turbonomic User has Site Admin or Admin role privileges to create groups and policies (see Additional Notes below)
Additional Notes:
Before we go any further and start getting into the details, I wanted to give kudos to @Eva Tuczai, @Justus Gries and @Marc Beckert for their contributions and assistance in putting this all together!
Enable Turbonomic Workload Resize Actions
We will now leverage Turbonomic’s analysis of resource usage to apply Limit and Request rightsizing decisions to the Turbonomic microservices. Note that by default, most Turbonomic components only specify Mem Limits and Mem Requests, so in a default configuration deployment you will not see CPU Limit and CPU Request resize actions. If your environment uses other mechanisms to inject resource configurations, whether they are specified via CR (to support Quotas) or limitrange, then any spec defined can be optimized.
Rightsizing assures that we mitigate OOMs and performance issues, as well as reduce limits and requests that are not needed. Resource demands on Turbonomic components are related to the number of objects under Turbonomic management. You may find that you need to resize more than once as your managed environment changes.0
For more details on Turbonomic Rightsizing, please read the following article: https://github.com/turbonomic/kubeturbo/wiki/Action-Details#resizing-vertical-scaling-of-containerized-workloads
Deploy Turbonomic Operator Resource Map (ORM)
Turbonomic Server components are managed by the t8c-operator, therefore we cannot directly modify workload limits and requests. We need to instruct Kubeturbo with a map of how to make these changes via the Turbonomic Custom Resource (aka the XL CR). This map is a custom resource called Operator Resource Map (ORM) and Turbonomic has one ready for you to use for the Turbonomic Server components.
- For Turbonomic Server deployments running on k8s versions of 1.16 or higher, create this ORM CRD:
kubectl create -f
https://raw.githubusercontent.com/turbonomic/orm/master/config/crd/bases/devops.turbonomic.io_operatorresourcemappings.yaml
- Deploy the ORM Custom Resource configured for the Turbonomic Operator in the same namespace as the Turbonomic Server
kubectl -n turbonomic apply -f
https://raw.githubusercontent.com/turbonomic/orm/master/library/ibm/turbo_operator_resource_mapping_sample_cr.yaml
- Restart the KubeTurbo pod to pick up this ORM
kubectl -n {namespace} delete pod kubeturbo-{pod}
Modify KubeTurbo Custom Resource Exclusion List
After you have deployed the ORM CRD, the next step is to modify the KubeTurbo CRD to allow actions to execute. Starting in 8.10.6 KubeTurbo will auto-create Container Spec groups for workloads controlled by an Operator by default and create a new default policy that sets those resize actions to recommend only so they cannot be automated. If you want to allow specific workloads that are controlled by an Operator to execute resize actions when an ORM is deployed, you can add these workloads to an exclusion list that is defined in a configmap
. Please refer to and follow the steps in the following wiki page regarding operator namespace exclusion so that all workloads in the namespace where the Turbonomic Server components are running will be allowed to execute: https://github.com/turbonomic/kubeturbo/wiki/Actions-and-Handling-Special-Cases#operator-controlled-workloads-recommend-only
Create Turbonomic Workload Controller Groups
Given that pod resizes also restart components, the following pods should only be resized during low user activity times (after hours during a scheduled execution window).
- Topology Processor
- Market
- History
- DB (if applicable)
- Timescale DB (if running embedded reporting)
To setup pod scaling properly, we need to create two groups: 1) pods that can resize any time, and 2) pods that should only resize during a scheduled execution window to avoid user downtime.
Group 1 (turbo-workloads-no-schedule) setup instructions:
- Navigate to Settings>Groups then click ‘New Group’
- Select ‘Workload Controller’ from the list of group types
- Give the group a name, such as ‘turbo-workloads-no-schedule’
- Click ‘Add Filter’ and select ‘Container Platform Cluster’
- Click the drop-down box and find and check the Turbonomic cluster, by default it will be ‘Kubernetes-Turbonomic’ unless you named it something else
- Next, click ‘Add Filter’ again and select ‘Namespace’.
- Click the drop-down box and find and check the turbonomic namespace (unless you named it something unique).
- Click ‘Add Filter’ again and this time select ‘Name’
- Change the first drop-down from ‘equals’ to ‘not equals’, click the RegEx box and then enter the following RegEx into the second field - ^(topology-processor|market|history|db|timescale)$
- When finished, you should see all of the Turbonomic pods EXCEPT the topology-processor, market, history, db (if applicable) and timescale (if applicable) – and it should look similar to this (assuming it does, click ‘Save Group’ to save this new group):
************************************************************************************************
Group 2 (turbo-workloads-with-schedule) setup instructions:
- Navigate to Settings>Groups then click ‘New Group’.
- Select ‘Workload Controller’ from the list of group types.
- Give the group a name, such as ‘turbo-workloads-with-schedule’.
- Click ‘Add Filter’ and select ‘Container Platform Cluster’.
- Click the drop-down box and find and check the Turbonomic cluster, by default it will be ‘Kubernetes-Turbonomic’ unless you named it something else.
- Next, click ‘Add Filter’ again and select ‘Namespace’.
- Click the drop-down box and find and check the turbonomic namespace (unless you named it something unique).
- Click ‘Add Filter’ again and this time select ‘Name’.
- Leave the first drop-down set to ‘equals’, click the RegEx box and then enter the following RegEx into the second field - ^(topology-processor|market|history|^db|timescale)$
- When finished, you should ONLY see the applicable Turbonomic pods (topology-processor, market, history, db and timescale) in this group – it should look similar to this (assuming it does, click ‘Save Group’ to save this new group):
Create Turbonomic Workload Controller Automation Policies
Next, in order to ensure the 5 Turbonomic components we have identified only get resized during an approved execution window, we now need to create two automation policies.
Scheduled Automation Policy:
- Navigate to Settings>Policies and then click ‘New Policy’.
- For policy type, select ‘Automation Policy’, and then for entity type, select ‘Workload Controller’.
- Give the policy a name, such as ‘turbo-pod-resize-with-schedule’.
- Under Scope, click ‘Select Group of Workload Controllers’, search for the group previously created for the Pods that should only be moved during a scheduled window (turbo-workloads-with-schedule was the group we created earlier) and click the checkbox then click ‘Select’.
- Expand the ‘Automation and Orchestration’ section then click ‘Add Action’.
- Click in the ‘Action Type’ box and click on the ‘Resize’ action type.
- Under ‘Action Acceptance’, change the value from ‘Manual’ to ‘Automatic’ to enable these actions to execute automatically during the approved window.
- Finally, at the bottom under ‘Execution Schedule’, click ‘Add Schedule’.
- You can either select a schedule you have created or create a new schedule. The ideal time window should be when you do not expect Users to log into Turbonomic, as these resize actions will cause the application to restart, which can take anywhere from 10 – 30+ minutes.
- Click ‘Submit’ to save the schedule, then click the radio button to attach the schedule to the policy.
- The Automation and Orchestration page should then look similar to this (click Submit to save it).
- This will take you back to the Policy, ensure it looks similar to the following screenshot then click ‘Save Policy’ to save and enable this scheduled resizing.
No Schedule Needed Automation Policy:
We now need to create an automation policy to enable automatic execution of Pod resizes for the rest of the Turbonomic components (without a schedule).
Note: you can choose to execute these components in a schedule or maintenance as well if you choose to.
- Navigate to Settings>Policies and then click ‘New Policy’.
- For policy type, select ‘Automation Policy’, and then for entity type, select ‘Workload Controller’.
- Give the policy a name, such as ‘turbo-pod-resize-no-schedule’.
- Under Scope, click ‘Select Group of Workload Controllers’, search for the group previously created for the Pods that do not necessarily need to be moved during a scheduled window (turbo-workloads-no-schedule was the group we created earlier) and click the checkbox then click ‘Select’.
- Expand the ‘Automation and Orchestration’ section then click ‘Add Action’.
- Click in the ‘Action Type’ box and click on the ‘Resize’ action type.
- Under ‘Action Acceptance’, change the value from ‘Manual’ to ‘Automatic’ to enable these actions to execute automatically. Click ‘Submit’ to save.
- This will take you back to the Policy, ensure it looks similar to the following screenshot then click ‘Save Policy’ to save and enable this workload controller policy.
Enable Turbonomic Pod Moves
Pod Moves proactively alleviate node congestion without relying on reactive pod evictions. At the same time, Turbonomic also identifies how to safely manage cluster capacity, whether it is wasted resources in unnecessary nodes that can be safely removed, or proactively projecting when new nodes are needed due to increasing resources. Pod Moves uniquely provide proactive, preventative, and dynamic resource management to Kubernetes. The following sections will detail changes we need to make to configure Kubeturbo, groups we need to make and then finally the automation policies we need to create leveraging the groups we just created.
For more on Pod Moves, read the following article: https://github.com/turbonomic/kubeturbo/wiki/Action-Details#turbonomic-pod-moves-continuous-rescheduling
Pod Moves are beneficial in a multi-node Kubernetes cluster. If you are running Turbonomic in the VM appliance configuration, there is only a single node k8s cluster, therefore no need to set up Pod Moves...........
Configure Kubeturbo: Pods with PVs
Some of the Pods we can automate moves for use Persistent Volumes (PVs). We will want to enable the Pod-With-PV move option. Check out the following page for complete instructions: https://github.com/turbonomic/kubeturbo/wiki/Action-Details#pods-with-pvs
Pods with PVs that are RWO will need to have an alternative mechanism to be able to relocate the Pod onto another compliant node, since 2 copies of the pod will not be able to attach the same PV at the same time. Follow the instructions based on your Kubeturbo deployment method:.
1. straight yamls - modify the deployment
spec:
template:
spec:
containers:
args:
- --fail-volume-pod-moves=false
2. heml chart - provide this parameter - set args.failVolumePodMoves=false
3. operator - edit and add to the kubeturbo-release CR
spec:
args:
failVolumePodMoves: 'false'
Configure Kubeturbo: OpenShift SCC context
In OpenShift we need to handle SCC in pod moves. If you have deployed Kubeturbo via the OpenShift Operator Hub, then this step is already done. For other deployment methods follow the instructions here: https://github.com/turbonomic/kubeturbo/wiki/Action-Details#openshift-environments
1. straight yamls - modify the deployment
spec:
template:
spec:
containers:
args:
- --sccsupport:*
2. heml chart - provide this parameter - set args.sccsupport=*
3. operator - edit and add to the kubeturbo-release CR
spec:
args:
sccsupport: '*'
Create Turbo Pod Groups for Pod Moves
1. Given that pod moves also restart components, the following list represent components that would require time (anywhere from 5-30 minutes) for the application to be ready for the user. To avoid potential downtime for the Turbonomic application, these components should be in a group that would disable Pod Move actions from generating, allowing Turbonomic analysis to move other more stateless pods.
- Topology Processor
- Market
- History
- DB (if not using your own external DB)
- Timescale DB (if running embedded reporting)
2. If you are running Turbonomic in a dedicated set of nodes, then you can scope these actions to the node group Turbonomic is using. If Turbonomic is in a shared node group with other applications, the analysis will consider all Pods to move except DaemonSets. You will want to automate Pod moves for all Pods to get full benefit.
3. To setup pod moves properly, we need to create two groups: 1) pods that can move any time, and 2) pods that should have move actions disabled.
Group 1 (turbo-pods-automated-moves) setup instructions:
- Navigate to Settings>Groups then click ‘New Group’.
- Select ‘Container Pod’ from the list of group types.
- Give the group a name, such as ‘turbo-pods-automated-moves’.
- Click ‘Add Filter’ and select ‘Container Platform Cluster’.
- Click the drop-down box and find and check the Turbonomic cluster, by default it will be ‘Kubernetes-Turbonomic’ unless you named it something else.
- Next, click ‘Add Filter’ again and select ‘Namespace’.
- Click the drop-down box and find and check the turbonomic namespace (unless you named it something unique).
- Click ‘Add Filter’ again and this time select ‘Name’.
- Change the first drop-down from ‘equals’ to ‘not equals’, click the RegEx box and then enter the following RegEx into the second field - ^(.*topology-processor-.*|.*market-.*|.*history-.*|.*timescale-.*|.*db-.*)$
- If you wish to scope down to pods on a specific set of nodes, you can leverage the filter of ‘Virtual Machine Tags’ which will allow you to set a filter on k8s node LABELS which can represent your agent / node pool

- When finished, you should see a dynamic list of pods that meet your filter criteria and should look similar to this. Click 'Save Group’ to save this new group:
Group 2 (turbo-pods-disable-moves) setup instructions:
- Navigate to Settings>Groups then click ‘New Group’.
- Select ‘Container Pod’ from the list of group types.
- Give the group a name, such as ‘turbo-pods-disable-moves’.
- Click ‘Add Filter’ and select ‘Container Platform Cluster’.
- Click the drop-down box and find and check the Turbonomic cluster, by default it will be ‘Kubernetes-Turbonomic’ unless you named it something else.
- Next, click ‘Add Filter’ again and select ‘Namespace’.
- Click the drop-down box and find and check the turbonomic namespace (unless you named it something unique).
- Click ‘Add Filter’ again and this time select ‘Name’.
- Ensure the drop-down is set to ‘equals’ and then enter the following RegEx into the second field - ^(.*topology-processor-.*|.*market-.*|.*history-.*|.*timescale-.*|.*db-.*)$
- When finished, you should see only the Turbonomic pods that represent components we do not want to generate move actions on: topology-processor, market, history, db, and timescale db (if applicable) – and it should look similar to this (assuming it does, click ‘Save Group’ to save this new group):
Create Turbonomic Automation Policies
We will now create policies to define how move actions get generated and executed across the Groups we just created.
Disable Pod Moves Group:
1. Create the Disable Pod Move automation policy and use the group of Turbonomic components that we want to restrict to not even be considered by the analysis to move.
- Navigate to Settings>Policies and then click ‘New Policy’.
- For policy type, select ‘Automation Policy’, and then for entity type, select ‘Container Pod’.
- Give the policy a name, such as ‘turbo-pod-moves-disabled’.
- Under Scope, click ‘Select Group of Container Pods’, search for the group previously created for the Pods that should not be considered for moves (turbo-pods-disable-moves was the group we created earlier) and click the checkbox then click ‘Select’.
- Expand the ‘Automation and Orchestration’ section then click ‘Add Action’.
- Click in the ‘Action Type’ box and click on the ‘Move’ action type.
- Under ‘Action Generation’, change the value from ‘Do not Generate Actions’.
- This will take you back to the Policy, ensure it looks similar to the following screenshot then click ‘Save Policy’ to save and enable this container pod policy.
Automated Pod Move Group:
1. We now need to create an automation policy to enable automatic execution of Pod moves for the rest of the Turbonomic components and other pods on the same nodes.
- Navigate to Settings>Policies and then click ‘New Policy’.
- For policy type, select ‘Automation Policy’, and then for entity type, select ‘Container Pod’.
- Give the policy a name, such as ‘turbo-pod-moves-automated’.
- Under Scope, click ‘Select Group of Container Pods’, search for the group previously created for the Pods that do not need to be moved during a scheduled window (turbo-pods-automated-moves was the group we created earlier) and click the checkbox then click ‘Select’.
- Expand the ‘Automation and Orchestration’ section then click ‘Add Action’.
- Click in the ‘Action Type’ box and click on the ‘Move’ action type.
- Under ‘Action Acceptance’, change the value from ‘Manual’ to ‘Automatic’ to enable these actions to execute automatically. Click ‘Submit’ to save.
- This will take you back to the Policy, ensure it looks similar to the following screenshot then click ‘Save Policy’ to save and enable this container pod policy.
Validate Actions Complete Successfully as Designed
Now that the preceding steps have been completed, verify actions are completing as expected. For the 5 Turbonomic pods (TP, Market, History, DB and Timescale), ensure the resizing actions only occur during the scheduled execution window and also ensure they are not executing Pod moves. For the rest of the Pods, ensure pod moves and resizing actions are occurring as needed.
If you chose to first setup the automation policies with manual execution for testing, after validating successful execution of actions you should now switch those policies to automatic execution.
Leverage the Executed Actions and Risk Avoided Widgets. You can create this in a Custom Dashboard to track the optimization performed through Turbo-on-Turbo actions.
Appendix
If you find that resizing actions on some Turbonomic components fail even after deploying the ORM, you may have an ORM that needs an update. A resize action that failed due to lack of ORM support will look like this:
There are 3 possible scenarios:
- The ORM you are using is out of date.
- The component cannot be modified. Third-party components that Turbonomic uses are not always configured with this flexibility. To verify this open an issue.
- Or the component is not included in the ORM deployed into the cluster. In this case, you should also open an issue so the ORM can be updated.
Open an Issue:
Turbonomic will maintain the ORM for the Turbonomic Server, and suggest if there is not an updated ORM available, you first open an Issue on the public GitHub project for ORM: https://github.com/turbonomic/orm/issues
Apply a Newer ORM Version:
Go to the GitHub project for the ORM and the IBM Turbonomic XL ORM will be kept in this folder: https://github.com/turbonomic/orm/tree/master/library/ibm.
If there is a newer version by last commit date, then download this version, go to the cluster and namespace that the Turbonomic Server is deployed into, and apply the newer ORM. Restart Kubeturbo-release or Kubeturbo pod to pick up the changes.