View Only

Is Cloud Storage IOPS the real cause for poor application performance?

By Emily Falconer posted Thu October 13, 2022 01:06 PM

[Originally posted on Turbonomic.com in September 2020. Authored by Bonnie Wong.]

Choosing the right compute size for a VM is always a challenge. In the cloud environment, choosing the right size for a VM from hundreds of instances offered by the provider is even more challenging. Generally, app owners look at vCPU and vMEM usage of the workload to determine the configuration required for the particular VM. In Azure, there are more factors that need to be considered. In this article, we will focus on Azure’s VM IOPS requirements and how the new feature in IBM Turbonomic 7.22.8 enhances the recommendation for continuously choosing the right size for VMs while assuring performance and reducing costs.

Azure VM IOPS Capacity
When you attach a premium storage disk to your VM, Azure provisions a guaranteed number of IOPS as per the disk specification. Each VM size also has a specific IOPS limit that it can sustain. Applications like OLTP (Online Transaction Processing) require high IOPS and it is important that the application infrastructure is optimized for IOPS. A VM with low IOPS capacity may prevent a disk with higher IOPS capabilities from achieving its IOPS potential. When a VM has multiple disks attached and different disk caching configurations, this can become even more complicated.

VM with Multiple Disks Attached
Figure 1 shows an Azure Standard_D2ds_v4 VM is underutilizing its vCPU (vMEM guest level metrics were not enabled). In this particular VM Type, the capacity of vCPU and vMEM are 18.4GHz and 8GiB, respectively. For illustration purposes, we disabled the IOPS awareness feature in this example. If we only consider these two factors, this VM should be moved to Standard_B2ms in order to save costs while the VM still has sufficient resources to maintain its vCPU workload.  


Figure 1 VM Scale Recommendation in 7.22.7

When we take a closer look, there are 5 disks attached to this VM. Furthermore, each of these disks has some disk operations on them. From Figure 3, we can see that each of the disks does not exceed their disk IOPS limit. However, Standard_D2ds_v4 VM Type has an IOPS capacity of 3200. The first three attached disks have a sum IOPS capacity of 3300. There is a potential risk that this VM has a bottleneck of IOPS even though each of the disks has sufficient IOPS capacity to maintain their operations.


Figure 2 Resources related to VM


Figure 3 List of disks attached to VM

Detection of VM IOPS Usage and Limits
In IBM Turbonomic 7.22.8, we provide IOPS capacity detection for Azure VM Types and the consideration of VM level IOPS usage in compute scale actions. It is an important factor to ensure that a scaling action is sizing the VM to the most suitable VM type, especially for VMs with high IOPS demand.

Figure 4 shows the VM scale recommendation for the same VM we evaluated from the previous section, now with the new VM IOPS feature enabled. In the action details window, the VM IOPS usage is shown along with vCPU and vMEM usage. As we can see, this particular VM has a high IOPS usage and is currently reaching its IOPS capacity. IBM Turbonomic is scaling this VM to Standard_F4s_v2 VM type, which offers 6400 IOPS capacity and will reduce the expected IOPS utilization to 50%.


Figure 4 VM Scale Recommendation with IOPS metric in 7.22.8

IOPS awareness also helps IBM Turbonomic users to ensure that when a VM’s resources are being underutilized, it still guarantees the performance as a result of the scale action.

Figure 5 shows a VM which is underutilizing its vMEM, IOPS and vCPU capacity. IBM Turbonomic recommended to scale the VM to Standard_DS1_v2 which will double the expected utilization of vCPU, vMEM and IOPS, while the hourly cost will be half from the current VM type the VM is using.


Figure 5 VM Scale Recommendation to better utilize vCPU, vMem and IOPS resources

IOPS Metrics and Disk Caching Configuration 
With disk caching enabled on Azure premium storage disks, VMs can achieve higher levels of performance. A data disk configured with ReadOnly caching has two benefits: (1) Read operations performed from cache, which resides in the VM memory and on the local SSD, are much faster than read operations from the disk, which is based on Azure blob storage (2) Premium Storage does not count Reads served from the cache towards the disk IOPS and hence it can serve a higher total IOPS (source).


Source: https://azure.microsoft.com/en-us/blog/azure-premium-storage-now-generally-available-2/

The Azure SDK provides APIs for metrics of IOPS usage on both the VM and individual disk level. However, the VM IOPS usage metric is presented in average usage among all the attached disks. As covered previously, Azure only counts IOPS which are not served from cache towards IOPS capacity limited by the VM type. Furthermore, depending on the host caching setting of the individual attached disk, IOPS usage could be partially counting towards the IOPS capacity.

IBM Turbonomic leverages IOPS usage, host caching configuration and cache miss rate of an individual disk to come up with the actual uncached IOPS usage. This calculation applies to both OS disks and data disks attached to the VM.


Note: R = Read IOPS of Disk, W = Write IOPS of Disk, m% = Read Cache Miss Rate

Once each individual disk’s uncached IOPS usage is determined, VM uncached IOPS usage at a specific time is the sum of all the uncached IOPS usage of all the individual disks at a particular time. Customers can see the uncached IOPS usage in the platform and the VM scale recommendation is based on the uncached IOPS usage and the VM’s IOPS capacity from the Azure VM type.

The IOPS awareness functionality adds another dimension to IBM Turbonomic's multidimensional optimization engine, which already includes vCPU, vMEM, Net IO, RI inventory, Application metrics (through APMs), and various cloud constraints such as quota, drivers, number of attached NICs and disks, and OS/Image types.

Our goal is to ensure that our platform provides the most accurate, trustworthy, and actionable cloud optimization recommendations, allowing our customers to spend less time optimizing their cloud continuously, which has become an impossible mission and instead empower them to focus on business-critical initiatives and innovation.

IBM Turbonomic Application Resource Management is the only solution that assures application performance continuously and in real-time by managing IT resources on-premises and multicloud, spanning heterogeneous virtual environments and containers IaaS, and PaaS workloads.