Global Storage Forum

Global Storage Forum

Connect, collaborate, and stay informed with insights from across Storage

 View Only

Rightsizing Kubernetes Pods with Kyverno

By Randhir Singh posted 6 days ago

  

Why Pod Size Matters

When running workloads in Kubernetes, defining Pod size (CPU and memory requests/limits) isn’t just a good practice — it’s essential for performance, stability, and cost control. 

Getting pod sizing wrong can be just as bad — if not worse — than forgetting to set it at all.

  • Over-Provisioning Wastes Money
    If you request way more CPU or memory than your pod actually needs, Kubernetes will reserve that capacity for you — meaning other workloads can’t use it. In cloud environments, that idle reservation still costs you money.

  • Under-Provisioning Hurts Performance
    If your pod is starved for CPU or memory, it will suffer slow response times, frequent throttling, or even get OOMKilled (terminated because it ran out of memory).

  • Cluster Scheduling Gets Skewed
    Oversized pods make the scheduler think the cluster is full, while undersized pods can cause resource contention when they use more than they asked for. Both scenarios lead to inefficient node utilization.

  • Operational Headaches Multiply
    Incorrect sizing leads to unpredictable scaling behavior in HPA (Horizontal Pod Autoscaler), more noisy alerts for SRE teams, and harder capacity planning.

Right-sizing ensures workloads run reliably, nodes are used efficiently, and costs stay under control — without starving other apps or overloading the cluster.

In this blog post, we'll discuss a method to determine the right size of a Pod using Kyverno.

Introducing Kyverno Policy Engine

Kyverno is a Kubernetes-native policy engine that helps you manage and secure workloads through declarative policies, written as YAML — the same way you define deployments, services, and other cluster resources. 

At its core, Kyverno watches for changes in Kubernetes resources (like Pods, Deployments, ConfigMaps) and applies your policies to either:

  1. Validate — block or warn if something doesn’t match your rules.

  2. Mutate — add or change fields automatically to enforce standards.

  3. Generate — create new resources or configurations as part of an admission workflow.

Kyverno supports three main kinds of policies:

  1. Validation Policies 

    • Purpose: Enforce rules by blocking (or warning about) non-compliant resources.

    • Example: Reject Pods without CPU/memory requests or limits.

    • How it works: Uses validate rules with conditions and patterns.

    • Use case: Stop over-provisioned or under-provisioned workloads before they enter the cluster.

  2. Mutation Policies 

    • Purpose: Automatically modify incoming resources to meet your standards.

    • Example: Add default CPU/memory requests to Pods that don’t specify them.

    • How it works: Uses mutate rules with patchStrategicMerge or overlay to inject or update fields.

    • Use case: Ensure workloads always have reasonable starting values without requiring developers to set them.

  3. Generation Policies 

    • Purpose: Automatically create additional resources when a triggering resource appears.

    • Example: Generate a Vertical Pod Autoscaler (VPA) object whenever a new Deployment is created.

    • How it works: Uses generate rules that watch for resource creation and spawn new objects.

    • Use case: Automatically attach scaling helpers or config maps to workloads for better optimization.

Kyverno Policies for Pod Right-Sizing

For resource optimization, we will mainly focus on Validation and Generation policies. The solution leverages following two Kyverno policies:

  1. Generate VPA — Auto-create VerticalPodAutoscalers

    Kyverno’s generation policies allow you to automatically create Kubernetes objects when related resources appear—perfect for adding Vertical Pod Autoscalers (VPAs) to your workloads without manual steps. These policies are part of Kyverno’s core functionality.

    A widely shared example is a ClusterPolicy that targets Deployments—excluding system namespaces—and generates a corresponding VPA resource per workload. This enables VPA to begin monitoring and recommending optimal resource sizes for each pod.

  2. Check Resources — Ensure Requests/Limits Evaluate to Sane Values

    Equally important is the validation policy, which Kyverno uses to enforce resource configurations. These policies run during admission, comparing specs against expectations and rejecting misconfigured pods. 

    • Checks actual resource settings

    • Compares them to VPA recommendations

    • Flags pods that are over- or under-provisioned, helping teams take corrective action.

Kyverno Policy Reports

When you set Kyverno to generate VerticalPodAutoscaler (VPA) objects in recommendation-only mode, the VPA’s Recommender component tracks usage and suggests resource allocations—without applying them. These recommendations, combined with Kyverno’s policy engine, feed into a structured feedback loop.

  1. VPA Recommender in Recommendation-Only Mode. Instead of making automatic changes, VPA runs in “Off” or “Recommendation” mode—this allows it to observe pod behavior over time and emit recommendations on CPU and memory requests without disrupting workloads.
    Kyverno’s Generate VPA policy automates the creation of VPAs for deployments, statefulsets, etc.—ensuring consistent coverage.
  2. Validation Against VPA Recommendations. A separate Check Resource (validation) policy compares actual pod resource settings with the VPA’s upper and lower bounds. This policy allows a 20% margin to reduce noise, only flagging pods that are meaningfully over- or under-provisioned.
  3. Kyverno Policy Reports Capture VPA Insights. These checks generate structured PolicyReports (or ClusterPolicyReports)—Kyverno’s CRDs that log pass/fail results for each rule evaluation. Reports store real-time compliance data and are accessible via native Kubernetes CLI tools or dashboards.

Now that we understand different components of Kyverno and how we can leverage Kyverno for resource optimization, let us look at the solution architecture for Pod rightsizing.

Context Diagram

The reference architecture is shown below. The solution consists of following components:

  1. VPA Recommender. Understands traffic patterns and resource utilization patterns. It observes these for a Pod and provides recommendations for optimal resource utilization. 
  2. Metric Server. VPA uses metric server to observe resource utilization to provide recommendations.
  3. VPA Configuration. Required to be configured for each resource under observation so that VPA recommendations can be tagged with the resource.
  4. Policy Report. To generate report if any of the resource is not passing the policies, based on which can take action to increase or reduce resources assigned to the Pod.
The solution requires two policies:
  1. Generate VPA. Whenever a workload is created in Kubernetes, e.g., Deployment, or DaemonSet, etc, an event is generated and Kyverno responds by generating a VPA configuration for the workload.
  2. Check resources. This policy continuously checks the resource utilization of a workload against its VPA recommendations. Based on that it generates reports that shows whether the resource is over provisioned or under provisioned. It also provides recommendations to rightsize the workload by increasing or decreasing resources allocated to it.

Implementation

Let us now see a working example of how to stitch the solution together using above components.

Pre-requisites

  1. Kubernetes Cluster. We'll use IBM Cloud IKS service to provision K8s cluster. Login to K8s cluster using ibmcloud cli.
    ibmcloud login --sso
    ibmcloud ks clusters
    ibmcloud target -g <resource group>
    ibmcloud ks cluster config --cluster <cluster>
  2. Metric Server. The metrics service is required. It comes installed by default with IBM IKS. You can check if one is installed using:
    kubectl get deployment metrics-server -n kube-system
    If installed and running, you should see something like: 
    NAME             READY   UP-TO-DATE   AVAILABLE   AGE
    metrics-server   2/2     2            2           5y37d
  3. VPA. 

    The Kubernetes VerticalPodAutoscaler has several components:

    • Recommender
    • Updater
    • Admission Plugin
    For this solution, we only need the VPA Recommender. You can install it by executing:
    kubectl apply -f https://raw.githubusercontent.com/nirmata/demo-resource-optimizer/main/config/vpa/install-vpa-recommender.yaml
  4. Kyverno. Execute the following commands to install Kyverno:
    helm repo add kyverno https://kyverno.github.io/kyverno/
    helm repo update
    helm install kyverno kyverno/kyverno -n kyverno --create-namespace
  5. Configure Kyverno RBAC permissions. 

    To generate a VerticalPodAutoscale, Kyverno needs to be given additional permissions for VPA resources.

    Execute the following command to configure Kyverno permissions:

    kubectl apply -f https://raw.githubusercontent.com/nirmata/demo-resource-optimizer/main/config/kyverno/kyverno-vpa-rbac.yaml
    kubectl apply -f https://raw.githubusercontent.com/nirmata/demo-resource-optimizer/main/config/kyverno/kyverno-vpa-rolebinding.yaml

Install Kyverno Policies

We want to generate VPA configurations for an existing deployments in a specific namespace. To install generate VPA policy, apply the following yaml. 

---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-vpa
  annotations:
    policies.kyverno.io/title: Generate VPA
    policies.kyverno.io/category: Resource Optimization
    policies.kyverno.io/severity: medium
    policies.kyverno.io/description: >-
      This policy generates Vertical Pod Autoscalers (VPAs)
      for Deployment and StatefulSet resources.
spec:
  # uncomment the following line to generate VPAs for existing resources
  generateExisting: true
  rules:
    - name: create-for-podcontrollers
      match:
        any:
        - resources:
            kinds:
              - Deployment      
            names:
              - time-series-query
              - time-series-writer
            namespaces:
              - si-dev-001b
      generate:
        synchronize: true
        kind: VerticalPodAutoscaler
        apiVersion: autoscaling.k8s.io/v1
        name: "{{request.object.metadata.name}}-kyverno" 
        namespace: "{{request.object.metadata.namespace}}"
        data:
          spec:
            targetRef:
              apiVersion: "{{request.object.apiVersion}}"
              kind: "{{request.object.kind}}"
              name: "{{request.object.metadata.name}}"
            updatePolicy:
              updateMode: "Off" 

The second policy is Check Resources. Install it by applying following yaml. We are targeting the same workloads as in Generate VPA policy.

---
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: check-resources
  annotations:
    policies.kyverno.io/title: Check Resources
    policies.kyverno.io/category: Resource Optimization
    policies.kyverno.io/severity: medium
    policies.kyverno.io/description: >-
      This policy checks if the resources requested by a pod controller (Deployment or StatefulSet) 
      are within the bounds recommended by the Vertical Pod Autoscaler (VPA).
spec:
  validationFailureAction: Audit
  background: true
  admission: false
  rules:
    - name: memory
      match: &matchDef
        any:
        - resources:
            kinds:
            - Deployment
            names:
            - time-series-query
            - time-series-writer
            namespaces:
            - si-dev-001b
      exclude:
        any: 
        - resources:
            selector:
              matchLabels:
                nirmata.io/resource-optimizer: "false"
            namespaces:
            - kube-system
      context: &ruleContextDef
        - name: vpa
          # currently this will return errors if the VPA is not available
          # so we fetch all VPAs and filter by the expected name
          # see: https://github.com/kyverno/kyverno/issues/9723
          # Note that this logic assumes that the VPA has the same name as the 
          # pod controller (Deployment or StatefulSet) with the suffix "-kyverno".
          apiCall:
            urlPath: "/apis/autoscaling.k8s.io/v1/namespaces/{{request.namespace}}/verticalpodautoscalers/"
            jmesPath: "items[?metadata.name == '{{request.object.metadata.name}}-kyverno'] | @[0] || `{}`"
      preconditions: &pre
        all:
        - key: "{{ vpa.status || `{}` }}"
          operator: NotEquals
          value: {}
        - key: "{{ time_since('', '{{vpa.metadata.creationTimestamp}}', '') }}"
          operator: GreaterThan
          value: 0m # Set to >24h for production
      validate:
        foreach:
          - list: vpa.status.recommendation.containerRecommendations
            context:
            - name: ctnr
              variable: &ctnrVariable
                value: "{{ request.object.spec.template.spec.containers[?name == '{{element.containerName}}'] | @[0] }}"
            - name: memRequest
              variable:
                value: "{{ ctnr.resources.requests.memory || `0` }}"
            deny:
              conditions:
                any:
                - key: "{{ multiply(memRequest, `0.80`) }}"
                  operator: GreaterThan
                  value: "{{ element.upperBound.memory }}"
                  message: "overprovisioned resources: reduce memory.request from {{memRequest}} to {{ divide(element.target.memory, `1048576`) }}Mi"
                - key: "{{ multiply(memRequest, `1.20`) }}"
                  operator: LessThan
                  value: "{{ element.lowerBound.memory }}"
                  message: "underprovisioned resources: increase memory.request from {{memRequest}} to {{ divide(element.target.memory, `1048576`) }}Mi"
    # Using multiple rules to report each violation separately. 
    # Otherwise, processing stops at the first violation
    # See: https://github.com/kyverno/kyverno/issues/8792   
    - name: cpu
      match: *matchDef
      context: *ruleContextDef
      preconditions: *pre
      validate:
        foreach:
          - list: vpa.status.recommendation.containerRecommendations
            context: 
            - name: ctnr
              variable: *ctnrVariable
            - name: cpuRequest
              variable:
                value: "{{ ctnr.resources.requests.cpu || `0` }}"
            deny:
              conditions:
                any:
                - key: "{{ multiply(cpuRequest, `0.80`) }}"
                  operator: GreaterThan
                  value: "{{ element.upperBound.cpu }}"
                  message: "overprovisioned resources: reduce cpu.request from {{cpuRequest}} to {{element.target.cpu}}"
                - key: "{{ multiply(cpuRequest, `1.20`) }}"
                  operator: LessThan
                  value: "{{ element.lowerBound.cpu }}"
                  message: "underprovisioned resources: increase cpu.request from {{cpuRequest}} to over {{element.target.cpu}}"

Let's verify if all components are in place for us to start generating resource recommendations for our workloads.

% kubectl get deployment metrics-server -n kube-system    
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   2/2     2            2           5y37d


% kubectl get po -n vpa                               
NAME                              READY   STATUS    RESTARTS   AGE
vpa-recommender-6664f7678-b5xbm   1/1     Running   0          40d


% kubectl get po -n kyverno
NAME                                             READY   STATUS    RESTARTS     AGE
kyverno-admission-controller-787c6f5c89-x5gx6    1/1     Running   5 (9d ago)   33d
kyverno-background-controller-686df748c7-nt8d6   1/1     Running   6 (8d ago)   33d
kyverno-cleanup-controller-54cf7bddd6-lrxjb      1/1     Running   7 (8d ago)   33d
kyverno-reports-controller-69f56cf7bb-96mn4      1/1     Running   5 (9d ago)   33d

Let's check our targeted workloads.

% kubectl get po -n si-dev-001b --selector='app in (time-series-writer,time-series-query)'

NAME                                  READY   STATUS    RESTARTS   AGE
time-series-query-55c7c48494-kncq9    1/1     Running   0          10h
time-series-writer-848759f458-bqrsr   1/1     Running   0          12h

VPA configurations will be automatically generated as per the Kyverno Generate-VPA policy that we installed earlier. VPA observes the metrics from the Pods using metrics server and provides recommendations.

% kubectl describe vpa -n si-dev-001b
Name:         time-series-query-kyverno
Namespace:    si-dev-001b
Labels:       app.kubernetes.io/managed-by=kyverno
              generate.kyverno.io/policy-name=generate-vpa
              generate.kyverno.io/policy-namespace=
              generate.kyverno.io/rule-name=create-for-podcontrollers
              generate.kyverno.io/trigger-group=apps
              generate.kyverno.io/trigger-kind=Deployment
              generate.kyverno.io/trigger-namespace=si-dev-001b
              generate.kyverno.io/trigger-uid=3a9e4a43-57df-4fdd-b4d2-53725f190bd8
              generate.kyverno.io/trigger-version=v1
Annotations:  <none>
API Version:  autoscaling.k8s.io/v1
Kind:         VerticalPodAutoscaler
Metadata:
  Creation Timestamp:  2025-07-10T06:20:55Z
  Generation:          1
  Resource Version:    1116201938
  UID:                 1e473ff8-4f70-4e29-854d-ec6d80731511
Spec:
  Target Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         time-series-query
  Update Policy:
    Update Mode:  Off
Status:
  Conditions:
    Last Transition Time:  2025-07-10T06:21:14Z
    Status:                True
    Type:                  RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  time-series-query
      Lower Bound:
        Cpu:     34m
        Memory:  5526734463
      Target:
        Cpu:     49m
        Memory:  5815202783
      Uncapped Target:
        Cpu:     49m
        Memory:  5815202783
      Upper Bound:
        Cpu:     66m
        Memory:  6131652484
Events:          <none>


Name:         time-series-writer-kyverno
Namespace:    si-dev-001b
Labels:       app.kubernetes.io/managed-by=kyverno
              generate.kyverno.io/policy-name=generate-vpa
              generate.kyverno.io/policy-namespace=
              generate.kyverno.io/rule-name=create-for-podcontrollers
              generate.kyverno.io/trigger-group=apps
              generate.kyverno.io/trigger-kind=Deployment
              generate.kyverno.io/trigger-namespace=si-dev-001b
              generate.kyverno.io/trigger-uid=419ec486-5c05-4535-a14c-91507c01cb13
              generate.kyverno.io/trigger-version=v1
Annotations:  <none>
API Version:  autoscaling.k8s.io/v1
Kind:         VerticalPodAutoscaler
Metadata:
  Creation Timestamp:  2025-07-10T06:20:56Z
  Generation:          1
  Resource Version:    1116201939
  UID:                 cef38e2c-c4d1-4110-ab66-9494f27359aa
Spec:
  Target Ref:
    API Version:  apps/v1
    Kind:         Deployment
    Name:         time-series-writer
  Update Policy:
    Update Mode:  Off
Status:
  Conditions:
    Last Transition Time:  2025-07-10T06:21:14Z
    Status:                True
    Type:                  RecommendationProvided
  Recommendation:
    Container Recommendations:
      Container Name:  time-series-writer
      Lower Bound:
        Cpu:     1642m
        Memory:  10108307283
      Target:
        Cpu:     4280m
        Memory:  10626315661
      Uncapped Target:
        Cpu:     4280m
        Memory:  10626315661
      Upper Bound:
        Cpu:     5810m
        Memory:  11173456739
Events:          <none>

Lets look at the policy report generated by the Check Resources policy for our targeted workloads.

% kubectl get polr -n si-dev-001b
NAME                                   KIND         NAME                 PASS   FAIL   WARN   ERROR   SKIP   AGE
3a9e4a43-57df-4fdd-b4d2-53725f190bd8   Deployment   time-series-query    2      1      0      0       0      32d
419ec486-5c05-4535-a14c-91507c01cb13   Deployment   time-series-writer   2      1      0      0       0      32d

The report shows a failure. Let's check the report.

% kubectl get polr 3a9e4a43-57df-4fdd-b4d2-53725f190bd8 -n si-dev-001b -o yaml
apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
  creationTimestamp: "2025-07-10T06:21:05Z"
  generation: 1187
  labels:
    app.kubernetes.io/managed-by: kyverno
  name: 3a9e4a43-57df-4fdd-b4d2-53725f190bd8
  namespace: si-dev-001b
  ownerReferences:
  - apiVersion: apps/v1
    kind: Deployment
    name: time-series-query
    uid: 3a9e4a43-57df-4fdd-b4d2-53725f190bd8
  resourceVersion: "1116190145"
  uid: 6852b84e-a0cd-49c0-a89d-0667d7aabb73
results:
- category: Resource Optimization
  message: 'validation failure: overprovisioned resources: reduce cpu.request from
    2 to 49m'
  policy: check-resources
  properties:
    process: background scan
  result: fail
  rule: cpu
  scored: true
  severity: medium
  source: kyverno
  timestamp:
    nanos: 0
    seconds: 1754958366
- category: Resource Optimization
  message: rule passed
  policy: check-resources
  properties:
    process: background scan
  result: pass
  rule: memory
  scored: true
  severity: medium
  source: kyverno
  timestamp:
    nanos: 0
    seconds: 1754958366
- category: Resource Optimization
  policy: generate-vpa
  properties:
    process: background scan
  result: pass
  rule: create-for-podcontrollers
  scored: true
  severity: medium
  source: kyverno
  timestamp:
    nanos: 0
    seconds: 1754617154
scope:
  apiVersion: apps/v1
  kind: Deployment
  name: time-series-query
  namespace: si-dev-001b
  uid: 3a9e4a43-57df-4fdd-b4d2-53725f190bd8
summary:
  error: 0
  fail: 1
  pass: 2
  skip: 0
  warn: 0

 From the report, it looks like we over-provisioned CPU. We need to reduce the CPU assigned to this Pod.

- category: Resource Optimization
  message: 'validation failure: overprovisioned resources: reduce cpu.request from
    2 to 49m'
  policy: check-resources

Let's check the failure in the other workload.

% kubectl get polr 419ec486-5c05-4535-a14c-91507c01cb13 -n si-dev-001b -o yaml
apiVersion: wgpolicyk8s.io/v1alpha2
kind: PolicyReport
metadata:
  creationTimestamp: "2025-07-10T06:21:06Z"
  generation: 1185
  labels:
    app.kubernetes.io/managed-by: kyverno
  name: 419ec486-5c05-4535-a14c-91507c01cb13
  namespace: si-dev-001b
  ownerReferences:
  - apiVersion: apps/v1
    kind: Deployment
    name: time-series-writer
    uid: 419ec486-5c05-4535-a14c-91507c01cb13
  resourceVersion: "1116190144"
  uid: 7f635c31-1dcb-4a68-8dd1-cec87ac18000
results:
- category: Resource Optimization
  message: rule passed
  policy: check-resources
  properties:
    process: background scan
  result: pass
  rule: cpu
  scored: true
  severity: medium
  source: kyverno
  timestamp:
    nanos: 0
    seconds: 1754958366
- category: Resource Optimization
  message: 'validation failure: underprovisioned resources: increase memory.request
    from 5Gi to 10134Mi'
  policy: check-resources
  properties:
    process: background scan
  result: fail
  rule: memory
  scored: true
  severity: medium
  source: kyverno
  timestamp:
    nanos: 0
    seconds: 1754958366
- category: Resource Optimization
  policy: generate-vpa
  properties:
    process: background scan
  result: pass
  rule: create-for-podcontrollers
  scored: true
  severity: medium
  source: kyverno
  timestamp:
    nanos: 0
    seconds: 1754617154
scope:
  apiVersion: apps/v1
  kind: Deployment
  name: time-series-writer
  namespace: si-dev-001b
  uid: 419ec486-5c05-4535-a14c-91507c01cb13
summary:
  error: 0
  fail: 1
  pass: 2
  skip: 0
  warn: 0

In this case, we under-provisioned memory. We need to increase memory assigned to this Pod as per the recommendations.

- category: Resource Optimization
  message: 'validation failure: underprovisioned resources: increase memory.request
    from 5Gi to 10134Mi'
  policy: check-resources

Step-by-Step Approach to Pod Rightsizing with Kyverno

  1. Initialize

    • Estimate initial CPU and memory requests/limits based on workload characteristics and past experience.

  2. Deploy and Load Test

    • Deploy workloads with the initial resources.

    • Run representative load tests to simulate typical operational conditions.

  3. Evaluate Policy Reports

    • Use Kyverno policy reports from Check Resources and Generate VPA policies.

    • Identify under- or over-provisioned resources based on recommendations and violations.

  4. Adjust Resources

    • Update resource requests/limits according to the Kyverno recommendations.

  5. Iterate

    • Repeat steps 2–4 until policy reports indicate no violations and performance is stable.

Conclusion


By following this iterative, policy-driven approach, you ensure workloads are sized accurately for performance and cost-efficiency. Kyverno’s validation and generation policies streamline the process, enabling continuous optimization without guesswork.

0 comments
20 views

Permalink