[Originally Published on Turbonomic.com]
We are excited to announce that Turbonomic supports horizontal scaling of cloud native application services as well as the underlying compute resources based on customer experience Service Level Objectives (SLOs). Released with 8.4.4, this feature is a huge step in supporting our customers who have modernized their business-critical applications to deliver great customer experiences while increasing elasticity. Too often organizations find themselves scaling microservices based on default utilization metrics and, unable to correlate autoscaling at the application layer with the underlying compute, spending a significant amount of labor setting triggers and policies that ultimately limit elasticity. With Turbonomic the investment in modern, cloud native microservices achieves elasticity that simultaneously benefits your bottom-line, customer experience, and even your sustainability goals.
Why is Turbonomic SLO-driven Kubernetes scaling important?
Today, “end-user satisfaction with application performance and reliability is critical for successful digital business operations.”1 However, delivering a great end-user experience has become more difficult due to the development of modern applications and the shift to agile infrastructure.2 This development has left IT teams struggling to make sense of immense amounts of data and has challenged the viability of traditional monitoring tools.3 In response to these challenges, IT and Platform teams have found new ways to measure the health of their environments, set expectations for their applications, and connect their efforts to business context through SLOs.
Most organizations currently undergo a time-consuming and laborious manual SLO configuration process. This exercise includes determining which services of an application need to be measured with an SLO, choosing which metrics to use as service level indicators (SLIs), and setting appropriate goals for that metric during a specific measurement period. After completing this process, IT teams then must create error budgets for each SLO and link them to a threshold-based alert system. When configuring SLOs, IT teams commonly choose the wrong SLIs, often utilization metrics, that do not directly tie to customer experience. Furthermore, a policy that is based on a threshold will trigger an action only when the threshold is breached. Thresholds do not account for the entire application stack and underlying infrastructure. Without full-stack analysis, the actions triggered by an SLO can negatively impact application performance: for example, scaling out an application without first ensuring there is underlying capacity in the nodes to support it. Even with Cluster Autoscaler, the start up of a new node only occurs because a pod is in a pending state, which is another reactive trigger, instead of an analysis to proactively manage cluster capacity.4
What makes Turbonomic unique? With Turbonomic SLO-driven Kubernetes scaling, our analytics engine uses multi-dimensional analysis to ensure that every workload gets the resources it needs to perform. It considers the resources relationships at every layer of the application stack—applications, services, containers, pods, nodes, VMs, hosts, storage, hyperconverged infrastructure, and network—continuously as demand changes. Through this analysis, you set SLOs that directly correlate to customer experience (ex. response-time, transaction throughput, or custom metrics that make sense for your business) and Turbonomic uses dynamic resourcing to assure that the platform and the underlying infrastructure manages to that SLO. Turbonomic provides easy flexible ways to collect your Service Level Indicators through direct integration with Instana, or from custom metrics available through popular observability tools like Prometheus that you can leverage with our Data Ingestion Framework (DIF). With the SLIs in place, you create actionable SLO policies and Turbonomic will use dynamic resourcing analysis and actions to assure that the service, platform and the infrastructure continuously manage to that SLO.
Smart Scaling of Services & Compute Based on SLOs
As shown in the image below, Turbonomic generates trustworthy actions by understanding the entire application stack and the dependencies across it. This supply chain begins by mapping out the application, then moving down to the platform, and then finishing with infrastructure the application runs on.
In addition to the resource supply chain of this application, below are several specific horizontal scaling actions that Turbonomic can automatically execute. These actions include provisioning pods because there are performance risks from Response Time Congestion (these actions are preemptive), and actions to provision a node to support scaling at the application layer—all actions are driven by the SLO and determined by the software.
Turbonomic will also identify vertical scaling optimization for the same workloads that you need to horizontally scale to assure SLOs. Vertical scaling is important to avoid inefficiencies, or worse, resource starved configuration. Consider CPU throttling. If an individual replica is being CPU throttled, running additional replicas will not assure performance. Optimizing the CPU Limit is essential so that each replica will provide the best performance possible.
While Turbonomic is provisioning pods and nodes within a cluster, and optimizing limits and requests, it is simultaneously and continuously moving pods to prevent performance degradation, defragment the cluster, and remain in compliance with business constraints. The ability to execute all these actions is the best way to assure continuous application performance and cluster health.
This new feature allows organizations to leverage SLOs so that business-critical services always perform, while achieving elasticity and eliminating resource waste from running more nodes than you need to support your SLOs. Imagine a solution that can help you confidently shut down nodes while maintaining a desired response time or transaction throughput. For those running applications in the cloud, such a scenario represents the perfect balance between customer experience, budget, and sustainability goals. Achieving true elasticity means you are only using what you need. In the cloud, what isn’t consumed can be used elsewhere. You reduce your Scope 3 emissions and allow the cloud provider to further optimize their infrastructure and operate more sustainably.
Over the coming weeks, we will publish a technical blog series that will further explore what makes Turbonomic’s analytics unique and how we are able to drive actions that manage cluster resources while making executable SLO scaling decisions. We will also examine our analytics and how they are used to overcome the limitations and challenges that Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler (CA) present, to help customers achieve performance, scalability and SLOs without sacrificing efficiency.
Turbonomic 8.4.4 release notes here.
To learn more about how leveraging ephemeral workloads in applications and infrastructure can help you achieve elasticity that benefits your bottom-line, your sustainability goals, as well as improve customer experience, read our white paper The Executive’s Guide to SLOs.