View Only

Optimization Master Class: Proving Elasticity Through Percentiles!

By Chris Graham posted Fri July 01, 2022 02:08 PM


This article originally appeared on the Turbonomic Blog, authored by Ryan McDonald & Hubert Leung


As software and customer demand endlessly evolve, so too must the Virtual Machines and the mechanics for sizing those Virtual Machines. Whether the VMs are hosted in your On-Premises clusters or scattered across various cloud regions in AWS and Azure, scaling these VMs and ensuring application performance continues to be a difficult challenge (and many folks’ full-time jobs).

The Art of Balancing Performance and Cost

“Migrate to the cloud,” they said. “There’s elasticity,” they said. Well, that’s what they said until they saw their outrageously large cloud bill, then the discussion went from elasticity to cost savings.

Performance is always a priority for application owners. It is important to allocate sufficient resources for the VMs. However, over-provisioning can lead to inefficient use of resources and increase the cost of operation. We want to leverage elasticity by resizing VMs on-demand. Balancing performance and cost is a tough problem. Determining the right amount of resources for the VMs is the key to success.

VM Sizing Status Quo

It is common to make resource allocation decisions by using peak and average values of VCPU and VMem utilizations. However, neither the peak nor the average is a good measurement. The peak indicates the maximum value of a metric observed over a period of time. It is not good because we may not sustain the peak utilization for a long time. It is normal and acceptable for VCPU to max out during some CPU intensive processes. Trying to accommodate peak values can lead to over-provisioning. On the other hand, the average value does not capture how high and how often the utilization goes above the average value, thus not capable of protecting the VM from over-utilization.

Considering the status quo has proven to be subpar in ensuring safe and performant elasticity of VMs, the industry can rejoice knowing that Turbonomic has cracked the code on workload elasticity.

Enter Turbonomic

Turbonomic is a performance automation platform at its core. Ensuring the performance of your applications is an n-dimensional problem. When it comes to workload scaling decisions, Turbonomic can solve such problems by looking at every possible constraint and every possible bottleneck in order to make performance-focused recommendations that are not only safe but will have the added bonus of helping you save money. Sure, that’s a bold claim, so let’s prove it!

Percentiles (What are those!?!?)

Turbonomic uses the percentile values of VCPU, VMem, and IOPS to make resource allocation decisions. If the 95th percentile of VCPU is 63%, VCPU utilization is below 63% for 95 percent of the time. See Figure 1 below, where the 95th percentile (labeled “P95”) is plotted against a utilization data set. The average value is also shown for comparison. You can see that most data points fall below the percentile value. The percentile value is a good characterization of VM usage because it allows occasional utilization spikes without triggering a VM's resize.

Figure 1 Percentile Utilization Example

Turbonomic also allows users to adjust the percentile measurement's sensitivity by varying the aggressiveness and observation period through policy settings. Lowering aggressiveness by using the 90th percentile instead of the 95th percentile favors allocation efficiency over performance. It allows utilization to go above the target utilization more often without upsizing.

The observation period controls the number of data points you use to calculate the percentile. The longer the observation period is, the more stable the percentile will be. There will be fewer scale actions when using longer observation periods.

By using percentile measurement coupled with scale action automation, Turbonomic can ensure application performance while keeping costs low at the same time.

Prove-It VM Scaling

Talking a big game about having a super cool platform that’s doing super cool things is never enough to establish trust in the IT community. Telling a story that provides the data backing scaling decisions and the transparency on why a VM can be safely resized for performance reasons is how Turbonomic is building trust. You don’t need to take our word for it, you can see it and understand it in seconds by checking out the details of our action recommendations.

Visualizing Percentile Utilization

Below, in Figure 2, is an example of how Turbonomic is providing a quick glimpse into the story of a VM by looking at the percentile utilization of its VMem over the last 30 days. Of course, there is more to the story for every VM and for every scaling action that Turbonomic recommends but interpreting the story of the graph by itself is a good start.

Figure 2 Upsize VMem Percentile Graph Example

At first glance of the graph, our eyes may wander first to the various lines that are depicted in the graph but then our brain tries to read its story and perhaps that goes something similar to the red numeric indicators that can be found in Figure 2:

  1. The title of the graph tells us what is being represented; VMem Percentile and its Average Utilization.
  2. Our succinct summary of what the VMem’s percentile value is which is defined in the context it is being shown in – a configurable 95th Percentile and 30-day observation period. Before we even jump into the details of the graph, we can get a sense of why this recommendation was made.
  3. Oh hey, our legends to tell us what data is being represented, again with configurable context!
  4. The line representing where our current 95th percentile of VMem Utilization lies for the 30-day period. It’s colorful and stands out. Remember, the percentile is our main character in the story being told here.
  5. The line representing historical daily average utilization data over the 30-day period. It’s a purposefully less appealing color than the percentile line as it’s not our main character, but still an important one nonetheless in seeing our story come together.
  6. Woah! Historical scaling decisions made for the VM that impacted VMem! Depicted irrespective of whether or not these actions were taken in Turbonomic! And it’s so easy to see what impact this had on the daily utilization!
  7. That’s the current time. And the recommended action (Upsize) that Turbonomic is recommending to ensure optimized performance of the VM.
  8. Well, if Turbonomic is telling us to size to a new instance type, what effect will that have on my VMem? Hovering over the current time, we’re presented a tooltip to see just that! Upsize 1GB to 2GB!
  9. The projected percentile. This is what we can expect our VMem’s 30-Day 95th Percentile to be after we size the VM to an instance type with more optimal VMem.
  10. Let’s hover over the projected line and see the tooltip showing what our projected percentile will be. Wow, it is projected to be 47%! We can bet our application(s) running on this VM will be even more performant after taking this action!

That was a bit of a long story to read through but having experienced the percentile metric in action, will help us understand the stories of other VMs much more clearly!

Proving Upsize

Looking at Figure 3 below, we can see an expanded scale action for an AWS EC2 Instance. Perhaps it’s running a cat meme generator application with thousands of constant users (because everybody needs a cat meme). Turbonomic is recommending an upsize from m5a.large to m5a.xlarge due to both VCPU and VMem congestion. Not only are we presented with Percentile & Utilization graphs, but plenty of other useful information providing visibility into why Turbonomic is making this recommendation and why we can trust it.

Figure 3 VM Upsize Details

Starting with the Percentile graphs, we see that VCPU utilization is below 100% for 95% of the time over the last 7 days. That doesn’t sound optimal! VMem is nearly just as over-utilized. Given the lack of variance between the 95th Percentile line indicators and the plotted average utilization time series data, it is pretty easy to see that VCPU/VMem aren’t just peaking, this VM is working hard and working often. Doubling both VCPU and VMem will bring these utilizations down from around 100% to around 50% which we can see in both the graph and the detail breakdown represented below the graphs. We can also see that there is a cost impact associated with this change due to the different on-demand rates between these instance types, but hey, we’ve got some RI coverage to burn! We absolutely want to take this action to ensure our very important cat meme generator app doesn’t crash or experience slowdowns when generating sophisticated cat memes.

Figure 4 Cat meme

Proving Downsize

Taking a look at the other end of the optimized spectrum, we have another VM scaling recommendation in Figure 5 below. Perhaps this VM is running an application that requires some state in memory but is not as popular as our cat meme generator. Turbonomic is understanding that and recommending a downsize from Azure Standard_B1ms to Standard_B1s due to underutilized IOPS, VCPU and VMem.

Figure 5 VM Downsize Details

VCPU and VMem have pretty low 95th percentile line indicators with utilization hanging right along the threshold, 2% and 31% respectively. We have a small amount of constant VMem usage but we’re not really clocking many computations. We can see that we have plenty of room to safely downsize our VM, especially given that our observation period is 30 days, so we have even more data backing this decision to help us feel secure in executing this size down recommendation to half our VCPU capacity and VMem capacity. This effectively doubles our utilizations, but the expectation is that our 30-day 95th Percentile of utilization will remain in a safe threshold and save us a bit of money along the way! We call this an efficiency action, but we’re still optimizing for performance! We have an opportunity to have a performant VM that costs less.

Sifting Through the Edge Cases

Not all scaling decisions are as clear cut as above. Turbonomic will intelligently scale VMs and does so with as much caution as necessary to comply with any business policy. Data-driven decision making is a two-way street though. With more data, we can provide even more intelligent actions but that is not to say that we will make hazardous decisions if we don’t know all of the cards. In the event that Turbonomic does not have statistics for some aspect of a VM, it will not recommend actions that would adjust it (i.e. no VMem data means no scaling based on VMem). If there are gaps in the knowledge that Turbonomic has for a VM’s utilization data, it will not negatively affect a scaling decision. In many instances, Turbonomic can poll that information when it becomes available to ensure the accuracy of the percentile calculation; we can even depict when data was being missed or when a VM was turned off as depicted in Figure 6.

Figure 6 Missing utilization data shown in graph


Scaling Virtual Machines for performance is a fool’s errand for humans to manage manually. VCPU and VMem alone are difficult to scale adequately, let alone a slew of many other environmental variables at play – quota, Network IO, Storage IO, pricing, RI Inventory, instance family availability, hyperthreading, speed differentials between CPUs of different families, and the list goes on! Not all of us want to spend our free time comparing benchmark speeds between the Intel Gen 4 family and Intel Gen 6 in order to get the scaling right. Sure, it can be done, but it’s impossible to get ahead when chasing optimized performance given all of these variables at play and the impending, often conflicting, worry of cost overruns. Considering all of these variables while using proven metrics and configurable constraints that provide trusted and conservative performance recommendations, Turbonomic has paved the way in making any application owner’s life easier. Turbonomic enables users to enjoy the benefits of performance-focused elasticity and the flexibility to choose their own level of efficiency.

Want to See Turbonomic in Action?

Using AWS? Check out our Turbonomic Demo for AWS Cloud.

Using Azure? Check out our Turbonomic Demo for Azure Cloud.

About the Authors:

Ryan McDonald

Ryan is a Full Stack Software Engineer on the Feature UX team at Turbonomic. Previously, he’s worked in the healthcare industry on speech recognition systems, then in the insurance industry on designing and implementing highly available omnichannel capabilities. At Turbonomic he has continued his journey to put his passion for building cool products into practice all while focusing on creating the best UX possible! He enjoys learning/mentoring, personal fitness, sports (he’s from Pittsburgh so… “Go Steelers!”), brewing beer, and hanging out with his zoo (1 dog, 2 cats, and a hedgehog).


Hubert Leung

Hubert is a senior software engineer at Turbonomic. He has contributed to many features of the product including cloud migration and reserved instance functionalities. Prior to joining Turbonomic, he had extensive experience architecting and developing a wide range of software projects for mission critical enterprise applications, open source projects and research innovations. He has demonstrated expertise in cloud computing technologies and the development of Java web applications, Android mobile applications and e-Commerce.

1 comment



Wed March 22, 2023 03:30 PM

this would be so much better with the graphs