View Only

An Introduction to Kubernetes HPA: Everything to Know for Achieving Application Elasticity at Scale

By Asena Hertz posted Tue May 03, 2022 12:46 AM

[Originally published on Turbonomic.com]

Are you looking for a solution to automating application elasticity? Dynamically scaling your services to meet changing demand? Containers enable applications to be architected for elasticity: ready for business in seconds you can spin up services when you need them, spin them back when you don’t. Only pay for what you use. With these goals in mind, an increasing number of organizations are leveraging horizontal pod autoscaler (HPA), which is native to Kubernetes.

It’s vital to understand the specifics of what HPA does and does not do in order to effectively determine if it’s the right approach for you. Keep reading this overview to learn about Kubernetes HPA.

Purpose of the Kubernetes HPA

Kubernetes HPA gives developers a way to automate the scaling of their stateless microservice applications to meet changing demand. To put this in context, public cloud IaaS promised agility, elasticity, and scalability with its self-service, pay-as-you-go models. The complexity of managing all that aside, if your applications are just sitting in VMs, it’s not very elastic, whether it’s in the cloud or not. So we can think of containerized applications and the elasticity they enable as another step closer to achieving what the cloud promised. HPA is a mechanism for doing that natively in Kubernetes.

How Kubernetes HPA Works

With Kubernetes HPA you automate the spinning up of additional pods by defining a threshold (typically CPU or Memory utilization, but there is support for custom metrics as well) that will trigger the action to spin up more pods. You must also define the min/max number of pods that can be deployed. You’ll then want to do some testing to make sure that the policies you’ve set will work in production where demand fluctuates. 

This exercise in determining what metric(s) best express the resources an application service needs, configuring thresholds, as well as setting upper/lower limits, and then testing and extrapolating if this will function under production demand, must be repeated for every service deployed. How many services do you have?

Is Kubernetes HPA Right for You?

Kubernetes HPA has a number of constraints that should be closely evaluated, especially if your intention is to achieve application elasticity on a broad scale, a single mission-critical application running on Kubernetes in production for example. Got more than one mission-critical application? More reason to closely consider what HPA’s approach will require of your teams/colleagues and what you can actually expect to achieve with it. 

The Labor that HPA Requires is NOT Insignificant

So let’s start with how HPA will impact you personally. The purpose of automation is to make your life easier. Will HPA do this for you? If you’re a Developer, as we’ve discussed above, it’s quite a bit of work to determine and define HPA policies. Never mind the fact that you’re already going through similar resourcing analysis for the containers that make up that pod. Last I heard, Developers like building applications… revenue-generating, mission-critical applications that are, let’s not forget, at the heart of digital transformation. Defining resourcing policies, not so much. 

HPA Has No-Full Stack Awareness

The second thing to consider is that the application you’re autoscaling for elasticity doesn’t operate in a silo. This has implications for those managing the Kubernetes platform: how they operate, as well as performance and efficiency. 

Kubernetes HPA has no awareness of the underlying infrastructure. Left to its own devices, HPA could spin up pods only to end up in a ‘pod pending’ state because there’s no capacity to support them. Those that are managing the Kubernetes platform, be they DevOps engineers, SREs, or Platform Engineers, have to understand the details of how HPA has been set up and do data collection and analysis of their own to determine how much capacity will be required to support the scaling out of these pods, perhaps leveraging the native Kubernetes cluster autoscaler to automate the at the cluster layer. Additionally, it’s important to remember that, they have to think in terms of supporting not just one application and it’s multiple services, but MULTIPLE applications. The complexity is real. 

By the way, if you’re managing multiple tenants on Kubernetes, be sure to check out this blog Best Practices for Managing Kubernetes Multitenancy: How to Effectively Control Requests, the Silent Killer of Elasticity & Efficiency.

HPA is an Autoscaling Mechanism, Not Optimization

As we’ve discussed, Kubernetes HPA provides a way to automate the scaling of stateless services. You use it to define when and by how much should pods be scaled out. But what if the containers that make up a pod are not appropriately sized? Too little, you’re propagating  services that will not perform. Too much, and you’re propagating a service that has been allocated more resources than it actually needs. You can consider leveraging Vertical Pod Autoscaler (VPA), which is also native to Kubernetes, but suffice to say, similar to HPA it does require a lot of analysis and work to define the policies for every service. This blog is a great resource for learning more from the front-lines: Vertical Pod Autoscaler deep dive, limitations and real-world examples.

Utilization Metrics ≠ End-User Experience

Lastly, the utilization metrics under which HPA typically operates are not direct indicators of the end-user experience. IDC predicts that by 2022, more than 60% of DevOps teams will be evaluated on KPIs and performance metrics including criteria tied to business outcomes such as customer satisfaction or new revenue gainsSource: IDC FutureScape: Worldwide Developer and DevOps 2021 Predictions. (DOC #US46417220 / OCT 27, 2020) 

Gone are the days of up vs. down. Applications are increasingly the primary medium by which a customer interacts with a business. Advances in observability is making it possible to more accurately measure the customer experience.  Applications should scale, not just to meet “demand,” but to assure that the SLOs that make sense for the business are met.

Don’t Lose Sight of the Bigger Picture: Your Mission-Critical Applications

As you evaluate Kubernetes HPA, don’t lose sight of the bigger picture. Your organization has made a significant investment in a mission-critical application so that it can horizontally scale to meet changing demand, ensuring that your end-users get the experience they expect (scale out), while your business is operating efficiently (scale back).

HPA is giving you a mechanism to autoscale stateless services. It’s a lot of work and it is only focused on scaling pods. Can your organization afford to have you and your colleagues (scarce engineering talent!), relegated to such mundane and myopic tasks?

In order to assure application performance and optimize the Kubernetes platform as a whole, you need to also consider container rightsizing, scaling at the cluster level, as well as the underlying infrastructure. And, dare we say it, consider continuous pod moves as a way to assure performance and defragment the cluster, under fluctuating demand. 

The complexity of managing application resources at every layer of the stack requires software to continuously do the analysis and make the right decisions based on the real-time resource needs of the application. Better yet, dynamically manage these resources to meet service level objectives, which directly reflect the end-user experience. 

Check out this demo video below to learn more: HPA is Not the Answer—Why Response-Time SLOs Should Drive the Kubernetes Platform and Underlying Infrastructure

Why HPA is not the answer