Cloud Platform as a Service

 View Only

Tuning IBM Cloud Code Engine Scale-Down Delay to Reduce Application Response Times

By Samuel Matzek posted Thu August 31, 2023 10:47 AM

  

Synopsis

IBM Cloud Code Engine is a fully managed, serverless platform that can host your cloud-native containerized workloads.  It can deploy and automatically scale resources for your web servers based on request rates.  However, in some cases Code Engine needs to be tuned to scale down workloads less aggressively to handle an ongoing application load.

Background

IBM Cloud Code Engine is a fully managed, serverless platform that can host your containerized workloads.  With Code Engine, you don’t need to know as much about cloud infrastructure or networking to deploy your workloads on the cloud.  Once deployed, Code Engine automatically scales your workloads up and down to match the level of the incoming traffic, and even scales them down to zero when there are no requests. You only pay for the resources you consume. In some cases, the autoscaling needs to be tweaked to achieve specific performance targets under load. Let’s look at one case where we tuned Code Engine’s autoscaling to achieve our application performance targets.


Problem

We began by tuning our application’s target and maximum concurrency settings for optimal performance using a single instance of our application. When we configured the application to scale the number of instances and then increased the simulated load we encountered high response times.


The following graph shows the 90th percentile response times. As we can see, the response times are spiking to unacceptably high levels every other minute.




We monitored the number of application instances over the duration of the test and noticed a similar pattern of many application instances being starting and ending every minute.


Under load the autoscaled instances were handling their requests and then exiting due to lower numbers of outstanding concurrent requests. The amount of load coming in would then trigger the autoscaler to start up many new instances which would clear the request queue and the cycle would start over again. This exiting and starting of new instances injects a startup penalty into the response times of the application.

Solution

To prevent the instances from exiting prematurely and to allow them time to receive the additional incoming requests from the ongoing load, the Code Engine’s “scale-down delay” parameter can be set. This gives the application instances more “time to live” during temporary drops in incoming requests.


We set the scale-down delay to 10 minutes and repeated the test with the same request ramp up and duration times. This resulted in the 90th percentile response rate being near a flat 68ms during the test.



Likewise, the number of application instances shows a gradual stepping up during the initial load ramp up, and a plateau at 16 instances which is lower than the 26 instances that were repeatedly being started to handle the load before the scale-down delay was set.



How to Set the Scale-Down Delay

The scale-down delay can be set with the Code Engine CLI by specifying the --scale-down-delay option on the app create or app update commands.

Conclusion

Some applications and usage patterns achieve large improvements in application response time when running in Code Engine by increasing the scale-down delay. The scale down delay can be easily set using the Code Engine CLI.

0 comments
37 views

Permalink