Cloud Platform as a Service

Join us to learn more from a community of collaborative experts and IBM Cloud product users to share advice and best practices with peers and stay up to date regarding product enhancements, regional user group meetings, webinars, how-to blogs, and other helpful materials.

View Only

Back to Blog List

OpenFaaS on RHOCP 4.x – Part 2: Autoscaling

By Alexei Karve posted Thu July 08, 2021 06:34 PM

OpenFaaS Function Custom Resource with HPA on OpenShift for IBM Power ppc64le

Introduction

In Part 1, we built and installed OpenFaaS on ppc64le and looked at an example function deployment to print Pi or Euler’s number to a fixed accuracy using the OpenFaaS stack.yml. We also used the AlertManager for autoscaling. OpenFaaS may instead use the Horizontal Pod Autoscaling (HPA) in OpenShift. In this case, the built-in autoscaler should be disabled. The HPA implements compute-resource based auto-scaling of the function instances. HPA monitors the compute resources used by the function instances and fires a scaling event, if the values of resources used by the function instances exceed a threshold.

This recipe will show an example function running with OpenFaaS on OpenShift 4.x using the Function Custom Resource and HPA for autoscaling with long running functions. This has been tested on the IBM® Power® System E880 (9119-MHE) based on POWER8® processor-based technology with OpenShift version 4.6.23.

We will create a new function that will allow us to retrieve Pi or Euler’s number to desired accuracy in Perl. We will first test locally on docker or podman. Then, use the OpenFaaS stack.yml approach to deploy the function with longer timeouts. Finally, we will use the Function Custom Resource to create the function and test with Horizontal Pod Autoscaling.

Deployment of a function using an OpenFaaS stack.yml

Print Pi or Euler's number to the wanted accuracy

We can easily create a template that we can use for ppc64le. Most templates use alpine as the base image that is available for ppc64le. We created the dockerfile-perl template to create a new function pi-ppc64le. The only change required is to include the powerlinux/classic-watchdog. We may alternatively use the dockerfile-ppc64le template. The Dockerfile in dockerfile-perl template is updated for installing perl 5.32.0 while the dockerfile-ppc64le can be modified to install perl 5.30.3-r0 as before.

Create a new function from the dockerfile-perl template:

faas-cli new pi-ppc64le --lang dockerfile-perl

This template contains a Dockerfile that installs perl with the following ENV for computing the value of Pi with the fixed value of 100 digits as before. In this scenario, we want to send input with multiple lines, each line containing the accuracy in number of digits desired. This will thus print values of Pi or Euler's number to multiple digits of accuracy. A separate file runme.pl was added to invoke the bpi function in fprocess because I could not figure out how to escape the ENV for fprocess with either of the following commands that process multiple lines of input:

'foreach my $line ( <STDIN> ) { chomp($line);if ($line=~/^$/) { last; } print(bpi($line)); }'

"foreach my \$line ( <STDIN> ) { chomp(\$line);if (\$line=~/^\$/) { last; } print(bpi(\$line)); }"

If someone can find the appropriate escape characters that can be used within ENV for either of the above, please leave comments below.

runme.pl for Pi

#!/usr/local/bin/perl

use bignum;

foreach my $line ( <STDIN> ) { chomp($line);print $line,"\n";if ($line=~/^$/) { last; } print(bignum::bpi($line),"\n"); }

For above, we can provide multiple lines as follows where each line is the desired accuracy:

10
100

.

runme.pl for Euler's number e raised to the appropriate power

#!/usr/local/bin/perl

use bignum;

foreach my $line ( <STDIN> ) { chomp($line);print $line,"\n";if ($line=~/^$/) { last; } print(bignum::bexp((split(' ',$line))[0],(split(' ',$line))[1]),"\n"); }

For above, we can provide multiple lines as follows where first number in each line is the power and second is the desired accuracy:

1 20
1 30
2 30

.

Replace the "ENV fprocess" in Dockerfile with the lines below:

COPY runme.pl /home/app/runme.pl

ENV fprocess="/home/app/runme.pl"

Test locally on docker/podman

If you have podman, just create a symbolic link as follows:

sudo ln -s /usr/bin/podman /usr/bin/docker

We can build the image directly with docker/podman build command or build the image using the "faas-cli build" command. I am creating images with the prefix as user karve. You can replace with your desired prefix.

1. Using the docker/podman build

# Build the image

cd pi-ppc64le

docker build -t karve/pi-ppc64le .

cd ..

# Test the docker image without external input

docker run -it --rm karve/pi-ppc64le perl -Mbignum=bpi -wle "print bpi(2000)" # Pi

docker run -it --rm karve/pi-ppc64le perl -Mbignum=bexp -wle "print bexp(1,2000)" # Euler’s number

# For the next two tests of the image, provide multiple lines of input and end with Ctrl-D

docker run -it --rm karve/pi-ppc64le perl -Mbignum=bpi -wle "foreach my \$line ( <STDIN> ) { chomp(\$line);if (\$line=~/^\$/) { last; } print(bpi(\$line)); }"

docker run -it --rm karve/pi-ppc64le perl -Mbignum=bpi -wle 'foreach my $line ( <STDIN> ) { chomp($line);if ($line=~/^$/) { last; } print(bpi($line)); }'

Testing the function with a file test for input can be done with multiple simultaneous curl processes.

docker run --rm -d -p 8081:8080 --name test-this karve/pi-ppc64le

# Create the test file and test with curl command

# Do not use -d, the newline characters get removed. You must use --data-binary

curl http://127.0.0.1:8081 --data-binary @test

docker stop test-this

test

Add empty line at end of test file for termination of loop

10
20
30

Output

10 3.141592654

20 3.1415926535897932385

30 3.14159265358979323846264338328

Let’s also test with long accuracy value of 3000 or larger that takes longer to respond. This requires increasing the default timeout from 10s to larger value, for example: 600s using the environment variables.

docker run --rm -d -p 8081:8080 --name test-this -e exec_timeout=600s -e write_timeout=600s -e read_timeout:600s karve/pi-ppc64le

printf "3000\n" | curl http://127.0.0.1:8081 --data-binary @-

docker stop test-this

Update the pi-ppc64le.yml

The faas-cli build command however adds the Dockerfile from the template into the build/pi2-ppc64le/function/ directory instead of the build/pi2-ppc64le/. To avoid a build error, we change the pi-ppc64le.yml from lang: dockerfile-perl to lang: dockerfile. Also update the image: pi-ppc64le:latest with image: karve/pi-ppc64le:latest and gateway: http://gateway-external-openfaas.apps.test-cluster.priv

pi-ppc64le.yml

version: 1.0
provider:
  name: openfaas
  gateway: http://gateway-external-openfaas.apps.test-cluster.priv
functions:
  pi-ppc64le:
    lang: dockerfile
    handler: ./pi-ppc64le
    image: karve/pi-ppc64le:latest
    environment:
      read_timeout: "600s"
      write_timeout: "600s"
      exec_timeout: "600s"
    limits:
      cpu: "500m"
      memory: "500Mi"
    requests:
      cpu: "100m"
      memory: "60Mi"

CPU requests and limits are expressed as fractions or in the form 100m. The latter can be read as "one hundred millicpu" or "one hundred millicores". 500m means half of one core. A request with a decimal point, like 0.1, is converted to 100m, and precision finer than 1m is not allowed. CPU requests are indicative of the percentage of CPU cores. Thus 10 means 10% of the CPU cores, 100 means 1 CPU core and 200 means 2 CPU cores. Memory requests and limits are measured in bytes that are expressed as a plain integer or as a fixed-point number using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. For example, the following represent roughly the same value: 128974848, 129e6, 129M, 123Mi.

2. Using the faas-cli build

Instead of the docker/podman build command, we can build the image using the "faas-cli build" as follows:

faas-cli build -f ./pi-ppc64le.yml && docker run --rm -d -p 8081:8080 --name test-this karve/pi-ppc64le

Test function on cluster

Build and push the image. Then deploy it and test using faas-cli and curl commands.

faas-cli build -f ./pi-ppc64le.yml

docker push karve/pi-ppc64le

faas-cli deploy -f ./pi-ppc64le.yml

faas-cli list --gateway http://gateway-external-openfaas.apps.test-cluster.priv

printf "10\n20\n30\n" | faas-cli invoke pi-ppc64le --gateway http://gateway-external-openfaas.apps.test-cluster.priv

printf "10\n20\n30\n" | curl -X POST --data-binary @- http://gateway-external-openfaas.apps.test-cluster.priv/function/pi-ppc64le -vvv -H "Content-Type:text/plain"

Delete the instance of pi-ppc64le

faas-cli delete pi-ppc64le --gateway http://gateway-external-openfaas.apps.test-cluster.priv

If you are going to provide larger accuracy values and/or a longer list, you will need to increase the timeouts in the environment within the template.yml and thus the generated yml file.

    environment:
      read_timeout: "600s"
      write_timeout: "600s"
      exec_timeout: "600s"

Any functions with a larger timeout than the gateway's timeout will end prematurely. Therefore, you will also need to increase the timeout by annotating the route gateway-external and any proxy as mentioned in the Issues section in Part 1.

oc annotate route gateway-external --overwrite haproxy.router.openshift.io/timeout=600s -n openfaas

Additionally, make sure that the timeouts for the gateway and the faas-netes are set correctly if not done during installation.

oc edit deployment gateway -n openfaas

For gateway

    spec:
      containers:
      - env:
        - name: read_timeout
          value: 600s
        - name: write_timeout
          value: 600s
        - name: upstream_timeout
          value: 600s
        - name: exec_timeout
          value: 600s

For faas-netes

        - name: read_timeout
          value: 600s
        - name: write_timeout
          value: 600s

Deployment of a function using a Function Custom Resource (CR)

We can create the function using the pi-ppc64le-function.yaml containing the Function Custom Resource if --operator was set during install of openfaas.

pi-ppc64le-function.yaml

apiVersion: openfaas.com/v1
kind: Function
metadata:
  name: pi-ppc64le
  namespace: openfaas-fn
spec:
  name: pi-ppc64le
  image: karve/pi-ppc64le:latest
  labels:
    com.openfaas.scale.min: "2"
    com.openfaas.scale.max: "15"
  environment:
    write_debug: "true"
    read_timeout: "600s"
    write_timeout: "600s"
    exec_timeout: "600s"

Apply the function CR

oc apply -f pi-ppc64le-function.yaml

oc get function -n openfaas-fn 

Output

pi-ppc64le 0s

Deleting the function

oc delete function pi-ppc64le -n openfaas-fn

Horizontal Pod Autoscaling

OpenFaaS and HPAv2 play nicely together. To use the HPAv2, we need to comment out the following labels from the pi-ppc64le-function.yml and deploy again with --label com.openfaas.scale.factor=0

labels:
com.openfaas.scale.min: "2"
com.openfaas.scale.max: "15"

faas-cli deploy -f ./pi-ppc64le.yml --label openfaas.scale.factor=0

Alternatively, we can set the label in the pi-ppc64le-function.yml and apply the yaml.

labels:
com.openfaas.scale.factor: 0

pi-ppc64le-function.yaml

# Custom Resource pi-ppc64le-function.yaml
apiVersion: openfaas.com/v1
kind: Function
metadata:
  name: pi-ppc64le
  namespace: openfaas-fn
spec:
  name: pi-ppc64le
  image: karve/pi-ppc64le:latest
  labels:
    com.openfaas.scale.factor: "0"
  #  com.openfaas.scale.min: "2"
  #  com.openfaas.scale.max: "15"
  environment:
    write_debug: "true"
    read_timeout: "600s"
    write_timeout: "600s"
    exec_timeout: "600s"
    #max_inflight: "10"
  limits:
    cpu: "500m"
    memory: "500Mi"
  requests:
    cpu: "100m"
    memory: "60Mi"

Apply the function CR

oc apply -f pi-ppc64le-function.yaml

We also disable auto-scaling by scaling alertmanager down to zero replicas, this will stop it from firing alerts. We do not want to scale using prometheus alerts.

oc scale -n openfaas deploy/alertmanager --replicas=0

oc get deployments -n openfaas alertmanager

Output

NAME READY UP-TO-DATE AVAILABLE AGE
alertmanager 0/0 0 0 16d

Create a HPAv2 rule for CPU

Horizontal Pod Autoscaler is supported in a standard way by kubectl/oc autoscale command. The parameters -n openfaas-fn refers to where the function is deployed, pi-ppc64le is the name of the function, --cpu-percentage is the level of CPU the pod should reach before additional replicas are added, --min minimum number of pods, --max maximum number of pods. HPA calculates pod cpu utilization as total cpu usage of all containers in pod divided by total requested.

oc autoscale deployment -n openfaas-fn \
pi-ppc64le \
--cpu-percent=30 \
--min=2 \
--max=20

oc get hpa/pi-ppc64le -n openfaas-fn # View the HPA record

Output

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
pi-ppc64le Deployment/pi-ppc64le <unknown>/30% 2 20 0 0s

Generate some load and check the time for the responses

for i in {1..10}; do printf "6000\n" | time curl -X POST --data-binary @- http://gateway-external-openfaas.apps.test-cluster.priv/function/pi-ppc64le -H "Content-Type:text/plain" -s -o /dev/null & done

Create a “test” file for generating load to this function.

printf "2000\n100\n\n" > /tmp/test

Generate the load using hey and look at the cpu load using the commands shown below. Also watch the Horizontal Pod Autoscaler and the cpu/memory usage of the pod replicas. The -c will simulate 10 concurrent users, -z will run for 10m, -t 600 is the timeout for each request in seconds. Note that the -D parameter must be provided before the URL as shown.

hey -z=10m -c 10 -t 600 -m POST -D /tmp/test http://gateway-external-openfaas.apps.test-cluster.priv/function/pi-ppc64le -H "Content-Type: text/plain"

watch "faas-cli describe pi-ppc64le --gateway $OPENFAAS_URL;oc get pods -n openfaas-fn" # Shows the number of replicas and invocations

watch "kubectl top pod -n openfaas-fn" # Usage of pods

#watch "oc adm top pod -n openfaas-fn" # Usage of pods

watch "oc describe hpa/pi-ppc64le -n openfaas-fn" # Get detailed information including any events such as scaling up and down

HPA reacts slowly to changes in traffic, both for scaling up and for scaling down. In some instances, you may wait more than 5 minutes for all your pods to scale back down to default level after the load has stopped.

Sample output from “oc describe hpa/pi-ppc64le -n openfaas-fn”

Name:                                                  pi-ppc64le
Namespace:                                             openfaas-fn
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Mon, 21 Jun 2021 12:31:22 -0400
Reference:                                             Deployment/pi-ppc64le
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  29% (29m) / 30%
Min replicas:                                          2
Max replicas:                                          20
Deployment pods:                                       6 current / 6 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
  Type    Reason             Age    From                       Message
  ----    ------             ----   ----                       -------
  Normal  SuccessfulRescale  3m53s  horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  3m37s  horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  3m22s  horizontal-pod-autoscaler  New size: 6; reason: cpu resource utilization (percentage of request) above targe

When deployed each function creates 1 to many Pods/containers depending on the minimum and maximum scaling parameters requested by the user. We can see the events that cause the replicas to increase in the output above. Functions can also scale to zero and back again through use of the faas-idler or the REST API. This article does not use the faas-idler.

If the pods remain in ContainerCreating state during scaling up, check the events to find the problem.

oc get events -n openfaas-fn

By default, deployed functions will use an imagePullPolicy of Always, which ensures functions using static image tags (e.g. "latest" tags) are refreshed during an update. If you exceed the limits for dockerhub. You could tag the image as follows and push to the OpenShift image registry on the local cluster.

docker tag karve/pi-ppc64le default-route-openshift-image-registry.apps.test-cluster.priv/openfaas-fn/pi-ppc64le
oc whoami -t > oc_token
docker login --tls-verify=false -u kubeadmin default-route-openshift-image-registry.apps.test-cluster.priv -p `cat oc_token`
docker push default-route-openshift-image-registry.apps.test-cluster.priv/openfaas-fn/pi-ppc64le

Then update the function with the following image:

image: image-registry.openshift-image-registry.svc:5000/openfaas-fn/pi-ppc64le:latest

This could be done by making the changes to the above function yaml and doing an apply again or directly with:

oc edit deployment pi-ppc64le -n openfaas-fn

You might need to manually delete the pods stuck in ContainerCreating state.

Alternatively, the behavior for imagePullPolicy is configurable in faas-netes via the image_pull_policy environment variable.

Even though the AlertManager is disabled, you can still look at the graph in Prometheus. Forward the Prometheus port 9090 and browse to http://localhost:9090

kubectl port-forward -n openfaas svc/prometheus 9090:9090

In Prometheus, graph the following:

rate(gateway_function_invocation_total{code="200"} [20s])

OpenFaaS Dashboard in Grafana

The Prometheus metrics can be used to build a dashboard with the Grafana project. Dashboards are useful for monitoring your system performance and as a diagnostic tool. We can build the Grafana image with the OpenFaaS Dashboard using the Dockerfile from https://github.com/stefanprodan/faas-grafana.

If you try to use an x86 base image to run on ppc64le, you will get the error "standard_init_linux.go:219: exec user process caused: exec format error". So, you only need to change the FROM line to “FROM ibmcom/grafana-ppc64le:5.2.0-f4” and build the image as shown below for ppc64le.

PROXY_URL="//10.3.0.3:3128";export http_proxy="http:$PROXY_URL";export https_proxy="http:$PROXY_URL";export no_proxy=localhost,127.0.0.1,.test-cluster.priv,10.3.158.61
git clone https://github.com/stefanprodan/faas-grafana.git

cd faas-grafana/Grafana
# vi Dockerfile # Change FROM ibmcom/grafana-ppc64le:5.2.0-f4
docker build -t default-route-openshift-image-registry.apps.test-cluster.priv/openfaas/faas-grafana:5.2.0-f4 .
# unset http_proxy;unset https_proxy
docker push default-route-openshift-image-registry.apps.test-cluster.priv/openfaas/faas-grafana:5.2.0-f4 --tls-verify=false
# Deploy a pod for Grafana
oc -n openfaas run --image=image-registry.openshift-image-registry.svc:5000/openfaas/faas-grafana:5.2.0-f4 --port=3000 grafana
oc -n openfaas expose pod grafana --type=NodePort --name=grafana
oc expose svc grafana
oc -n openfaas get routes grafana -o jsonpath='{.spec.host}'

Browse to the Grafana interactive visualization web application at the route printed above http://grafana-openfaas.apps.test-cluster.priv

The default password is admin:admin that will be changed on first login. We can view the OpenFaaS dashboard that shows the same data that we can get from Prometheus, but in a more user-friendly way.

If the Horizontal Pod Autoscaler (HPA) fails in getting CPU consumption and reports ‘unknown for current cpu usage : "the HPA was unable to compute the replica count"’ in OCP 4 with apiVersion autoscaling/v1, you could use the autoscaling/v2beta2. This failure happens if you have not specified the limits and resources in the yaml file used to create the function with either the OpenFaaS or Function Custom Resource. Let’s delete the previous hpa and create a new one using the hpa-pi-ppc64le.yaml

# delete the previous hpa for cpu resource
oc delete hpa/pi-ppc64le -n openfaas-fn

Create a HPAv2 rule for CPU and Memory

The hpa-pi-ppc64le.yaml shows the rule with CPU and Memory resources. The averageUtilization of 30 means 30 percent of one core with “Type: Utilization”. This is equivalent to averageValue of 300m with “Type: AverageValue”. Apply the new rule and generate load using hey. You will see the autoscaling scale up the number of function pods.

oc apply -f hpa-pi-ppc64le.yaml
watch "oc describe hpa.v2beta2.autoscaling/hpa-pi-ppc64le -n openfaas-fn"

hey -z=10m -c 20 -t 600 -m POST -D /tmp/test http://gateway-external-openfaas.apps.test-cluster.priv/function/pi-ppc64le -H "Content-Type: text/plain"

hpa-pi-ppc64le.yaml

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hpa-pi-ppc64le
  namespace: openfaas-fn
spec:
  maxReplicas: 20
  minReplicas: 2
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: pi-ppc64le
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        #type: Utilization
        #averageUtilization: 30
        type: AverageValue
        averageValue: 300m
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 100Mi

Output from hey shows the response times for the requests to the pi-ppc64le function. At the start, only two pods were present for this function causing the initial surge of requests to be serviced by these pods. Later new pods get created because of scaling up. Since simultaneous requests are submitted to the same two initial pods with limit for cpu: "500m", they will take longer time to respond. Only new requests will go to the scaled-up pods. Therefore, the scaled-up pods may remain idle until the previous requests from the initial pods are completed and the gateway submits new requests from hey to the new pods.

Summary:
  Total:    677.4370 secs
  Slowest:  216.6568 secs
  Fastest:  21.7998 secs
  Average:  68.5332 secs
  Requests/sec:  0.1388
 
  Total data:    190068 bytes
  Size/request:  2022 bytes
 
Response time histogram:
  21.800 [1]     |■
  41.286 [39]    |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  60.771 [19]    |■■■■■■■■■■■■■■■■■■■
  80.257 [12]    |■■■■■■■■■■■■
  99.743 [9]     |■■■■■■■■■
  119.228 [0]    |
  138.714 [0]    |
  158.200 [0]    |
  177.685 [4]    |■■■■
  197.171 [3]    |■■■
  216.657 [7]    |■■■■■■■

Latency distribution:
  10% in 23.7644 secs
  25% in 28.1016 secs
  50% in 50.6900 secs
  75% in 80.4898 secs
  90% in 191.6926 secs
  95% in 211.1526 secs
  0% in 0.0000 secs
 
Details (average, fastest, slowest):
  DNS+dialup:    0.0006 secs, 21.7998 secs, 216.6568 secs
  DNS-lookup:    0.0002 secs, 0.0000 secs, 0.0022 secs
  req write:     0.0000 secs, 0.0000 secs, 0.0002 secs
  resp wait:     68.5316 secs, 21.7993 secs, 216.6563 secs
  resp read:     0.0007 secs, 0.0001 secs, 0.0072 secs
 
Status code distribution:
  [200]    94 responses

When a Pod replica needs to be added, the OpenShift scheduler selects a node for the Pod to run on. Each node has a maximum capacity for each of the resource types: the amount of CPU and memory it can provide for Pods. The scheduler ensures that, for each resource type, the sum of the resource requests of the scheduled Containers is less than the capacity of the node. Even though the actual memory or CPU resource usage on nodes may be low, the scheduler still refuses to place a Pod on a node if the capacity check fails. This protects against a resource shortage on a node when resource usage later increases.

The HPA describe command may switch between the two forms with and without “m” for Metrics “resource memory on pods” and “resource cpu on pods”. Each replica for the pi-ppc64le can consume one full core for each function that it services. Each replica can service multiple function requests simultaneously. If your nodes have enough cores, you may specify greater than 100 as the limit for the cores if you want to allow pods that take full advantage of cores and run multiple function requests simultaneously.

When automatic scale down occurs, your in-flight requests may get killed if being serviced by the pods that are terminated. The client needs to check the return code and disconnected requests. In both these cases, the client needs to resubmit the requests. If you want to disable scale down to prevent in flight requests from being killed, you could change the behavior by increasing the stabilizationWindow or disable scaleDown policy as follows:

spec:
  #behavior:
  #  scaleDown:
  #    stabilizationWindowSeconds: 1800 # Half hour
  behavior:
    scaleDown:
      selectPolicy: Disabled

The downscale stabilization window can be set on a per-HPA basis by setting the behavior.scaleDown.stabilizationWindowSeconds field in the v2beta2 API. Alternatively, the global HPA settings exposed as flag --horizontal-pod-autoscaler-downscale-stabilization for the kube-controller-manager component cooldown delay can be set to avoid thrashing. The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).

Same situation will happen when the node is low on memory. The OpenShift eviction policy stops pods as failed. The Function pods are scheduled in a different node since they are managed by a ReplicaSet specified by the minReplicas and maxReplicas.

If you have multiple requests being served by each pod, the memory required by the pod increases. If the memory limit set in the deployment is small, it can cause the pod to die. The system may terminate the container as it tried to use more memory than its limit. You can set the max_inflight environment variable to the maximum simultaneous requests that should be served. In this case, when additional requests are submitted to the pod, it returns “Concurrent request limit exceeded” or “curl: (55) Send failure: Broken pipe”. The client submitting the request needs to handle these responses and resubmit the request after a timeout.

Function execution timeouts that we set in the environment, prevent the function from executing too long. You will see the message that will kill the function execution if there were few pods to service the initial surge of requests. The pod will be restarted. To handle this, a new request will need to be resubmitted from the client.

2021/07/10 10:10:10 Function was killed by ExecTimeout: 10m0s
2021/07/10 10:10:10 Took 600.007541 secs
2021/07/10 10:10:10 signal: killed

We will create the function again, this time with the max_inflight: "2". This will prevent each pod function instance from servicing more than 2 requests simultaneously.

# Custom Resource pi-ppc64le-function.yaml
apiVersion: openfaas.com/v1
kind: Function
metadata:
  name: pi-ppc64le
  namespace: openfaas-fn
spec:
  name: pi-ppc64le
  image: karve/pi-ppc64le:latest
  labels:
    com.openfaas.scale.factor: "0"
  environment:
    write_debug: "true"
    read_timeout: "600s"
    write_timeout: "600s"
    exec_timeout: "600s"
    max_inflight: "2"
  limits:
    cpu: "500m"
    memory: "500Mi"
  requests:
    cpu: "100m"
    memory: "60Mi"

A sample python client to handle the above errors is shown in pi_invoke_function.py to simultaneously submit upto 64 requests. It generates numbers in range(1800,2200,3) and submits these to the pi-ppc64le function. If it finds “Concurrent request limit exceeded” or “Killed” or curl’s returncode!=0, the request is resubmitted after a timeout. It will retry upto a retrycount of 10 times with different sleep time depending on the error code before giving up. The stdout and stderr outputs from each of the requests is saved to /tmp.

pi_invoke_function.py

from joblib import Parallel, delayed
import time
import os
import subprocess
import sys

def process(index,total,value):
    value=str(value)
    print("Processing",index,"/",total,value, flush=True)
    retrycount=0
    while True:
        # create two files to hold the output and errors, respectively
        with open("/tmp/"+value+'.out','w+') as fout:
            with open("/tmp/"+value+'.err','w+') as ferr:
                out = subprocess.Popen(cmd,shell=True,stdout=fout,stderr=ferr)
                out.wait(1200) # Wait upto 20 minutes for each request
                fout.seek(0)
                output=fout.read()
                ferr.seek(0)
                errors = ferr.read()
                if out.returncode!=0:
                    if retrycount>=10:
                        print("Giving Up returncode",out.returncode,"retrycount",retrycount,value)
                        return "*"+value
                    retrycount=retrycount+1
                    print("Sleeping 120s because returncode",out.returncode,"retrycount",retrycount,value)
                    print("output",output)
                    print("errors",errors)
                    time.sleep(120.1)
                    continue
                if output.find("Concurrent request limit exceeded.")>=0:
                    if retrycount>=10:
                        print("Giving Up Concurrent request limit exceeded. retrycount",retrycount,value)
                        return "*"+value
                    retrycount=retrycount+1
                    print("Sleeping 150s because Concurrent request limit exceeded. retrycount",retrycount,value)
                    time.sleep(150.1)
                    continue
                if output.find("Killed")>=0 or errors.find("Killed")>=0:
                    if retrycount>=10:
                        print("Giving Up Killed. retrycount",retrycount,value)
                        return "*"+value
                    retrycount=retrycount+1
                    print("Sleeping 60s because Killed. retrycount",retrycount,value)
                    time.sleep(60.1)
                    continue
                print("Processed",index,"/",total,value, flush=True)
                return value
 
values=[i for i in range(1800,2200,3)]
total=len(values)
results = Parallel(n_jobs=64, prefer="threads")(delayed(process)(index,total,value) for index,value in enumerate(values))
print(results, flush=True

You can delete the hpa when you are done as follows:

oc delete hpa hpa-pi-ppc64le -n openfaas-fn

Additional examples are available for running functions on ppc64le in Node.js, Python, Linux command line utility. Most sample functions use Dockerfiles that have base images available for ppc64le. You should now be able to create your own templates, build and run functions on OpenShift ppc64le. It can be really easy to start developing serverless applications.

Conclusion

We looked at building images for functions that run on OpenShift 4.x for ppc64le. We looked at deploying functions in OpenFaaS using a Function Custom Resource. We also looked at using the cpu-based and memory-based Openshift HPA for autoscaling with long running functions. Knowing how to monitor resource usage in your functions and setting timeouts is of vital importance. This will allow you to discover different issues that can affect the health of the applications running in the cluster and handle problems due to CPU starvation and memory overcommit. A pod without limits is free to use all the resources in the node. You have to properly configure the limits. Monitoring the resources and how they are related to the limits and requests will help you set reasonable values and avoid OOM kills as well as allow fair sharing of resources. In Part 3, we will cover asynchronous function execution and function chaining with OpenFaaS on RedHat OpenShift for ppc64le.

Hope you have enjoyed the article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve. I look forward to hearing about how you use OpenFaaS with Autoscaling, what kind of problems timeouts have caused and if you would like to see something covered in more detail.

References

Self-paced workshop for OpenFaaS https://github.com/openfaas/workshop/blob/master/README.md
Manage functions with Kubelet https://www.openfaas.com/blog/manage-functions-with-kubectl/
Metrics HPAv2 with OpenFaaS https://docs.openfaas.com/tutorials/kubernetes-hpa/
Custom Alert for 429 https://github.com/openfaas/nats-queue-worker/issues/105#issuecomment-787494533
Serverless Computing on Constrained Edge Devices https://helda.helsinki.fi/bitstream/handle/10138/314280/Tilles_Jan_Pro_gradu_2020.pdf?sequence=3&isAllowed=y
Create ASCII Text Banners https://www.tecmint.com/create-ascii-text-banners-in-linux-terminal/

#automation
#Cloud
#Edge
#ibmpower
#openfaas
#Openshift
#Python

0 comments

31 views

Permalink

https://community.ibm.com/community/user/blogs/alexei-karve/2021/07/06/openfaas-on-rhocp-2

Cloud Platform as a Service

Cloud Platform as a Service

OpenFaaS on RHOCP 4.x – Part 2: Autoscaling

By Alexei Karve posted Thu July 08, 2021 06:34 PM

Permalink

Additional
Resources

Office

Quick Links

Cloud Platform as a Service

Cloud Platform as a Service

OpenFaaS on RHOCP 4.x – Part 2: Autoscaling

By Alexei Karve posted Thu July 08, 2021 06:34 PM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources