Previously we looked at what metrics and statistics make up memory measurements and the importance of each. This understanding was critical to properly analyzing the real world examples shown.
But how should this knowledge be applied toward our end goal? Ultimately everyone always asks: how big do I set the requests and limits? As you may guess by now, the answer is: it depends. However, here are some general (memory specific) recommendations based on my experiences.
Quality Of Service
Kubernetes assigns pods a quality of service (QoS) class based on the requests and limits.
Guaranteed: All containers have memory (and CPU) requests and limits set and equal. Since the focus here is on memory, I’m only considering guaranteed in regard to memory requests and limits being equal. CPU is very different given its characteristics.
Burstable: Requests and limits are not equal.
BestEffort: No requests or limits are set.
Where Are You?
First you must answer some basic questions about your deployment:
— Where will you be running? Dev? Test? Production?
— What workers will you be on? Dedicated, tainted (isolated) nodes? Shared nodes? Shared with other unknown (noisy neighbor) products and service?
— How impactful would a pod eviction be? An OOM kill? Are you resilient to this?
If you are running on dedicated, tainted (isolated) nodes, the rules are a bit different. You’re not having to worry about other workloads coming in, stealing or sharing resources. You and your container(s) know you have the worker node and hopefully can share nicely amongst yourselves. OpenShift master or infra nodes are an example of this. In this case, not setting requests and limits is fine.
However, if you’re running on shared nodes with other workloads, your incentive to stake your claim is paramount. Do you trust the other containers from who knows where to place nice? Do you even trust your own?
Do You Do File IO?
For production containers that do file IO, and therefore stress the page cache, I would recommend setting the request and limit equal. As we have seen, unless you’re extremely oversized, your cache will use all of the limit. You know you’re going to use the limit, so be a good citizen and budget for it in the request. Help the Kubernetes scheduler plan for your pod. I can be convinced in a dev setting to let this slide, but in production I consider it a must. If you do not, it is possible you will face evictions from Kubernetes. And usually the file IO containers are the critical databases of the deployment, making evictions potentially even more impactful.
For containers that do not do file IO, and are therefore mostly RSS, you can choose to diverge the requests and limits. Since the request is your budget, and ultimately a driver of hardware requirements, optimizing and minimizing the request is important to keep costs down. However, still be sure to request at least the minimum you know you will need. Beyond that, some “buffer” in the limit to prevent frequent OOM kills is needed. In particular for production, you don’t want to be running above your request by design, so don’t skimp on the requests. At most I’m comfortable with a limit that is double the request, but that’s just me. Ideally from testing you’re able to narrow down the expected per container usage and only apply a small limit buffer on top of that.
What Is Your RSS?
To choose the request you need to first look at where you expect the RSS to be. Figuring that out is potentially tricky, as illustrated above. Your RSS may grow when more load is applied, or it may be more configuration and tuning driven (finding the right tuning settings can have their own complexities and struggles that vary from process to process).
For example, you find through iterative testing your JVM needs a 512M Xmx. On top of that is native memory, bringing your RSS to around 700MiB on average. I would start by setting the request to 700MiB, and limit at 900MiB. Usually you can expect the usage to be around the 700MiB, but you don’t want to be OOM killed for the occasional spike. If however this container was doing file IO, I would start with a 1400MiB request and limit and adjust from there based on the IO needs.
How Do You Scale?
Ultimately, choosing the right request size comes down to how much work you want each container to handle. Do you horizontally or vertically scale? Both?
For example, lets say you have a container that processes incoming metrics (not a database). Through iterative testing and tuning you find that each container can handle around 1 million metrics per minute at a reasonable size. You measure the usage to be 500MiB, add a buffer and set the limit to 700MiB and you’re done! Now if you want to do 2 million metrics per minute, which is easier? Scale up the pods? Or reconfigure all of the memory settings and redeploy a larger pod? Definitely the horizontal scale!
Note, despite the marketing, horizontal scale is NOT a given in the Cloud. It must be build into your architecture and data structures. Partitioning and division of work is key to allowing work to be spread across pods, including ultimately the database level (where most of the problems originate). Just because you scale up your pods does not mean you have horizontal scale!
Achieving Balance
The trick is getting the balance just right. Each container will have some base memory cost when you deploy it idle, some more than others. If you set your desired load per container too small, you can end up with a lot of waste as you scale out, duplicating the base cost over and over to dozens or hundreds of containers. On the flip side, if you set your desired per container unit of work too large, your smaller deployments that may need a fraction of that unit will be oversized.
To achieve this balance, build into your deployments the ability to choose between vertical size classes. Once the selected vertical capacity is exhausted or inefficient, horizontally scale from there. Since CPU is usually driven up and down by workload levels more than memory, it is usually a better metric to drive your scale up and down decisions. But that is another topic for another day.
Workload Complexity Disclaimer
In the examples above, a BIG assumption is made: your workload characteristics are predictable. The only time this is true is in PowerPoint.
In reality, workloads have their own complexities and consideration that start to make our cockpit even more like a Saturn V than a toddlers play car.
Your container handles 1 million metrics per minute, great! What do those metrics look like? How big are the tags and strings on the metrics? What is the cardinality and distribution of the data? What are your retention settings? These can have ENORMOUS impacts on the results by exponential orders of magnitude. Tweak one setting on the input and suddenly your nice 500Mi becomes 5,000Mi.
Again, another topic for another day. Just keep in mind that your sizing must take into account a workload that realistically covers all of the permutations you need to cover. An “average” workload is a great start, but cannot be the only one considered.