OpenFaaS Function Custom Resource with HPA on OpenShift for IBM Power ppc64le
Introduction
In Part 1, we built and installed OpenFaaS on ppc64le and looked at an example function deployment to print Pi or Euler’s number to a fixed accuracy using the OpenFaaS stack.yml. We also used the AlertManager for autoscaling. OpenFaaS may instead use the Horizontal Pod Autoscaling (HPA) in OpenShift. In this case, the built-in autoscaler should be disabled. The HPA implements compute-resource based auto-scaling of the function instances. HPA monitors the compute resources used by the function instances and fires a scaling event, if the values of resources used by the function instances exceed a threshold.
This recipe will show an example function running with OpenFaaS on OpenShift 4.x using the Function Custom Resource and HPA for autoscaling with long running functions. This has been tested on the IBM® Power® System E880 (9119-MHE) based on POWER8® processor-based technology with OpenShift version 4.6.23.
We will create a new function that will allow us to retrieve Pi or Euler’s number to desired accuracy in Perl. We will first test locally on docker or podman. Then, use the OpenFaaS stack.yml approach to deploy the function with longer timeouts. Finally, we will use the Function Custom Resource to create the function and test with Horizontal Pod Autoscaling.
Deployment of a function using an OpenFaaS stack.yml
Print Pi or Euler's number to the wanted accuracy
We can easily create a template that we can use for ppc64le. Most templates use alpine as the base image that is available for ppc64le. We created the dockerfile-perl template to create a new function pi-ppc64le. The only change required is to include the powerlinux/classic-watchdog. We may alternatively use the dockerfile-ppc64le template. The Dockerfile in dockerfile-perl template is updated for installing perl 5.32.0 while the dockerfile-ppc64le can be modified to install perl 5.30.3-r0 as before.
Create a new function from the dockerfile-perl template:
faas-cli new pi-ppc64le --lang dockerfile-perl
This template contains a Dockerfile that installs perl with the following ENV for computing the value of Pi with the fixed value of 100 digits as before. In this scenario, we want to send input with multiple lines, each line containing the accuracy in number of digits desired. This will thus print values of Pi or Euler's number to multiple digits of accuracy. A separate file runme.pl was added to invoke the bpi function in fprocess because I could not figure out how to escape the ENV for fprocess with either of the following commands that process multiple lines of input:
'foreach my $line ( <STDIN> ) { chomp($line);if ($line=~/^$/) { last; } print(bpi($line)); }'
"foreach my \$line ( <STDIN> ) { chomp(\$line);if (\$line=~/^\$/) { last; } print(bpi(\$line)); }"
If someone can find the appropriate escape characters that can be used within ENV for either of the above, please leave comments below.
runme.pl for Pi
#!/usr/local/bin/perl
use bignum;
foreach my $line ( <STDIN> ) { chomp($line);print $line,"\n";if ($line=~/^$/) { last; } print(bignum::bpi($line),"\n"); }
For above, we can provide multiple lines as follows where each line is the desired accuracy:
10
100
.
runme.pl for Euler's number e raised to the appropriate power
#!/usr/local/bin/perl
use bignum;
foreach my $line ( <STDIN> ) { chomp($line);print $line,"\n";if ($line=~/^$/) { last; } print(bignum::bexp((split(' ',$line))[0],(split(' ',$line))[1]),"\n"); }
For above, we can provide multiple lines as follows where first number in each line is the power and second is the desired accuracy:
1 20
1 30
2 30
.
Replace the "ENV fprocess" in Dockerfile with the lines below:
COPY runme.pl /home/app/runme.pl
ENV fprocess="/home/app/runme.pl"
Test locally on docker/podman
If you have podman, just create a symbolic link as follows:
sudo ln -s /usr/bin/podman /usr/bin/docker
We can build the image directly with docker/podman build command or build the image using the "faas-cli build" command. I am creating images with the prefix as user karve. You can replace with your desired prefix.
1. Using the docker/podman build
# Build the image
cd pi-ppc64le
docker build -t karve/pi-ppc64le .
cd ..
# Test the docker image without external input
docker run -it --rm karve/pi-ppc64le perl -Mbignum=bpi -wle "print bpi(2000)" # Pi
docker run -it --rm karve/pi-ppc64le perl -Mbignum=bexp -wle "print bexp(1,2000)" # Euler’s number
# For the next two tests of the image, provide multiple lines of input and end with Ctrl-D
docker run -it --rm karve/pi-ppc64le perl -Mbignum=bpi -wle "foreach my \$line ( <STDIN> ) { chomp(\$line);if (\$line=~/^\$/) { last; } print(bpi(\$line)); }"
docker run -it --rm karve/pi-ppc64le perl -Mbignum=bpi -wle 'foreach my $line ( <STDIN> ) { chomp($line);if ($line=~/^$/) { last; } print(bpi($line)); }'
Testing the function with a file test for input can be done with multiple simultaneous curl processes.
docker run --rm -d -p 8081:8080 --name test-this karve/pi-ppc64le
# Create the test file and test with curl command
# Do not use -d, the newline characters get removed. You must use --data-binary
curl http://127.0.0.1:8081 --data-binary @test
docker stop test-this
test
Add empty line at end of test file for termination of loop
10
20
30
.
Output
10 3.141592654
20 3.1415926535897932385
30 3.14159265358979323846264338328
Let’s also test with long accuracy value of 3000 or larger that takes longer to respond. This requires increasing the default timeout from 10s to larger value, for example: 600s using the environment variables.
docker run --rm -d -p 8081:8080 --name test-this -e exec_timeout=600s -e write_timeout=600s -e read_timeout:600s karve/pi-ppc64le
printf "3000\n" | curl http://127.0.0.1:8081 --data-binary @-
docker stop test-this
Update the pi-ppc64le.yml
The faas-cli build command however adds the Dockerfile from the template into the build/pi2-ppc64le/function/ directory instead of the build/pi2-ppc64le/. To avoid a build error, we change the pi-ppc64le.yml from lang: dockerfile-perl to lang: dockerfile. Also update the image: pi-ppc64le:latest with image: karve/pi-ppc64le:latest and gateway: http://gateway-external-openfaas.apps.test-cluster.priv
pi-ppc64le.yml
version: 1.0
provider:
name: openfaas
gateway: http://gateway-external-openfaas.apps.test-cluster.priv
functions:
pi-ppc64le:
lang: dockerfile
handler: ./pi-ppc64le
image: karve/pi-ppc64le:latest
environment:
read_timeout: "600s"
write_timeout: "600s"
exec_timeout: "600s"
limits:
cpu: "500m"
memory: "500Mi"
requests:
cpu: "100m"
memory: "60Mi"
CPU requests and limits are expressed as fractions or in the form 100m. The latter can be read as "one hundred millicpu" or "one hundred millicores". 500m means half of one core. A request with a decimal point, like 0.1, is converted to 100m, and precision finer than 1m is not allowed. CPU requests are indicative of the percentage of CPU cores. Thus 10 means 10% of the CPU cores, 100 means 1 CPU core and 200 means 2 CPU cores. Memory requests and limits are measured in bytes that are expressed as a plain integer or as a fixed-point number using one of these suffixes: E, P, T, G, M, K. You can also use the power-of-two equivalents: Ei, Pi, Ti, Gi, Mi, Ki. For example, the following represent roughly the same value: 128974848, 129e6, 129M, 123Mi.
2. Using the faas-cli build
Instead of the docker/podman build command, we can build the image using the "faas-cli build" as follows:
faas-cli build -f ./pi-ppc64le.yml && docker run --rm -d -p 8081:8080 --name test-this karve/pi-ppc64le
Test function on cluster
Build and push the image. Then deploy it and test using faas-cli and curl commands.
faas-cli build -f ./pi-ppc64le.yml
docker push karve/pi-ppc64le
faas-cli deploy -f ./pi-ppc64le.yml
faas-cli list --gateway http://gateway-external-openfaas.apps.test-cluster.priv
printf "10\n20\n30\n" | faas-cli invoke pi-ppc64le --gateway http://gateway-external-openfaas.apps.test-cluster.priv
printf "10\n20\n30\n" | curl -X POST --data-binary @- http://gateway-external-openfaas.apps.test-cluster.priv/function/pi-ppc64le -vvv -H "Content-Type:text/plain"
Delete the instance of pi-ppc64le
faas-cli delete pi-ppc64le --gateway http://gateway-external-openfaas.apps.test-cluster.priv
If you are going to provide larger accuracy values and/or a longer list, you will need to increase the timeouts in the environment within the template.yml and thus the generated yml file.
environment:
read_timeout: "600s"
write_timeout: "600s"
exec_timeout: "600s"
Any functions with a larger timeout than the gateway's timeout will end prematurely. Therefore, you will also need to increase the timeout by annotating the route gateway-external and any proxy as mentioned in the Issues section in Part 1.
oc annotate route gateway-external --overwrite haproxy.router.openshift.io/timeout=600s -n openfaas
Additionally, make sure that the timeouts for the gateway and the faas-netes are set correctly if not done during installation.
oc edit deployment gateway -n openfaas
For gateway
spec:
containers:
- env:
- name: read_timeout
value: 600s
- name: write_timeout
value: 600s
- name: upstream_timeout
value: 600s
- name: exec_timeout
value: 600s
For faas-netes
- name: read_timeout
value: 600s
- name: write_timeout
value: 600s
Deployment of a function using a Function Custom Resource (CR)
We can create the function using the pi-ppc64le-function.yaml containing the Function Custom Resource if --operator was set during install of openfaas.
pi-ppc64le-function.yaml
apiVersion: openfaas.com/v1
kind: Function
metadata:
name: pi-ppc64le
namespace: openfaas-fn
spec:
name: pi-ppc64le
image: karve/pi-ppc64le:latest
labels:
com.openfaas.scale.min: "2"
com.openfaas.scale.max: "15"
environment:
write_debug: "true"
read_timeout: "600s"
write_timeout: "600s"
exec_timeout: "600s"
Apply the function CR
oc apply -f pi-ppc64le-function.yaml
oc get function -n openfaas-fn
Output
pi-ppc64le 0s
Deleting the function
oc delete function pi-ppc64le -n openfaas-fn
Horizontal Pod Autoscaling
OpenFaaS and HPAv2 play nicely together. To use the HPAv2, we need to comment out the following labels from the pi-ppc64le-function.yml and deploy again with --label com.openfaas.scale.factor=0
labels:
com.openfaas.scale.min: "2"
com.openfaas.scale.max: "15"
faas-cli deploy -f ./pi-ppc64le.yml --label openfaas.scale.factor=0
Alternatively, we can set the label in the pi-ppc64le-function.yml and apply the yaml.
labels:
com.openfaas.scale.factor: 0
pi-ppc64le-function.yaml
# Custom Resource pi-ppc64le-function.yaml
apiVersion: openfaas.com/v1
kind: Function
metadata:
name: pi-ppc64le
namespace: openfaas-fn
spec:
name: pi-ppc64le
image: karve/pi-ppc64le:latest
labels:
com.openfaas.scale.factor: "0"
# com.openfaas.scale.min: "2"
# com.openfaas.scale.max: "15"
environment:
write_debug: "true"
read_timeout: "600s"
write_timeout: "600s"
exec_timeout: "600s"
#max_inflight: "10"
limits:
cpu: "500m"
memory: "500Mi"
requests:
cpu: "100m"
memory: "60Mi"
Apply the function CR
oc apply -f pi-ppc64le-function.yaml
We also disable auto-scaling by scaling alertmanager down to zero replicas, this will stop it from firing alerts. We do not want to scale using prometheus alerts.
oc scale -n openfaas deploy/alertmanager --replicas=0
oc get deployments -n openfaas alertmanager
Output
NAME READY UP-TO-DATE AVAILABLE AGE
alertmanager 0/0 0 0 16d
Create a HPAv2 rule for CPU
Horizontal Pod Autoscaler is supported in a standard way by kubectl/oc autoscale command. The parameters -n openfaas-fn refers to where the function is deployed, pi-ppc64le is the name of the function, --cpu-percentage is the level of CPU the pod should reach before additional replicas are added, --min minimum number of pods, --max maximum number of pods. HPA calculates pod cpu utilization as total cpu usage of all containers in pod divided by total requested.
oc autoscale deployment -n openfaas-fn \
pi-ppc64le \
--cpu-percent=30 \
--min=2 \
--max=20
oc get hpa/pi-ppc64le -n openfaas-fn # View the HPA record
Output
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
pi-ppc64le Deployment/pi-ppc64le <unknown>/30% 2 20 0 0s
Generate some load and check the time for the responses
for i in {1..10}; do printf "6000\n" | time curl -X POST --data-binary @- http://gateway-external-openfaas.apps.test-cluster.priv/function/pi-ppc64le -H "Content-Type:text/plain" -s -o /dev/null & done
Create a “test” file for generating load to this function.
printf "2000\n100\n\n" > /tmp/test
Generate the load using hey and look at the cpu load using the commands shown below. Also watch the Horizontal Pod Autoscaler and the cpu/memory usage of the pod replicas. The -c will simulate 10 concurrent users, -z will run for 10m, -t 600 is the timeout for each request in seconds. Note that the -D parameter must be provided before the URL as shown.
hey -z=10m -c 10 -t 600 -m POST -D /tmp/test http://gateway-external-openfaas.apps.test-cluster.priv/function/pi-ppc64le -H "Content-Type: text/plain"
watch "faas-cli describe pi-ppc64le --gateway $OPENFAAS_URL;oc get pods -n openfaas-fn" # Shows the number of replicas and invocations
watch "kubectl top pod -n openfaas-fn" # Usage of pods
#watch "oc adm top pod -n openfaas-fn" # Usage of pods
watch "oc describe hpa/pi-ppc64le -n openfaas-fn" # Get detailed information including any events such as scaling up and down
HPA reacts slowly to changes in traffic, both for scaling up and for scaling down. In some instances, you may wait more than 5 minutes for all your pods to scale back down to default level after the load has stopped.
Sample output from “oc describe hpa/pi-ppc64le -n openfaas-fn”
Name: pi-ppc64le
Namespace: openfaas-fn
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 21 Jun 2021 12:31:22 -0400
Reference: Deployment/pi-ppc64le
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 29% (29m) / 30%
Min replicas: 2
Max replicas: 20
Deployment pods: 6 current / 6 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 3m53s horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 3m37s horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 3m22s horizontal-pod-autoscaler New size: 6; reason: cpu resource utilization (percentage of request) above targe
When deployed each function creates 1 to many Pods/containers depending on the minimum and maximum scaling parameters requested by the user. We can see the events that cause the replicas to increase in the output above. Functions can also scale to zero and back again through use of the faas-idler or the REST API. This article does not use the faas-idler.
If the pods remain in ContainerCreating state during scaling up, check the events to find the problem.
oc get events -n openfaas-fn
By default, deployed functions will use an imagePullPolicy of Always, which ensures functions using static image tags (e.g. "latest" tags) are refreshed during an update. If you exceed the limits for dockerhub. You could tag the image as follows and push to the OpenShift image registry on the local cluster.
docker tag karve/pi-ppc64le default-route-openshift-image-registry.apps.test-cluster.priv/openfaas-fn/pi-ppc64le
oc whoami -t > oc_token
docker login --tls-verify=false -u kubeadmin default-route-openshift-image-registry.apps.test-cluster.priv -p `cat oc_token`
docker push default-route-openshift-image-registry.apps.test-cluster.priv/openfaas-fn/pi-ppc64le
Then update the function with the following image:
image: image-registry.openshift-image-registry.svc:5000/openfaas-fn/pi-ppc64le:latest
This could be done by making the changes to the above function yaml and doing an apply again or directly with:
oc edit deployment pi-ppc64le -n openfaas-fn
You might need to manually delete the pods stuck in ContainerCreating state.
Alternatively, the behavior for imagePullPolicy is configurable in faas-netes via the image_pull_policy environment variable.
Even though the AlertManager is disabled, you can still look at the graph in Prometheus. Forward the Prometheus port 9090 and browse to http://localhost:9090
kubectl port-forward -n openfaas svc/prometheus 9090:9090
In Prometheus, graph the following:
rate(gateway_function_invocation_total{code="200"} [20s])
OpenFaaS Dashboard in Grafana
The Prometheus metrics can be used to build a dashboard with the Grafana project. Dashboards are useful for monitoring your system performance and as a diagnostic tool. We can build the Grafana image with the OpenFaaS Dashboard using the Dockerfile from https://github.com/stefanprodan/faas-grafana.
If you try to use an x86 base image to run on ppc64le, you will get the error "standard_init_linux.go:219: exec user process caused: exec format error". So, you only need to change the FROM line to “FROM ibmcom/grafana-ppc64le:5.2.0-f4” and build the image as shown below for ppc64le.
PROXY_URL="//10.3.0.3:3128";export http_proxy="http:$PROXY_URL";export https_proxy="http:$PROXY_URL";export no_proxy=localhost,127.0.0.1,.test-cluster.priv,10.3.158.61
git clone https://github.com/stefanprodan/faas-grafana.git
cd faas-grafana/Grafana
# vi Dockerfile # Change FROM ibmcom/grafana-ppc64le:5.2.0-f4
docker build -t default-route-openshift-image-registry.apps.test-cluster.priv/openfaas/faas-grafana:5.2.0-f4 .
# unset http_proxy;unset https_proxy
docker push default-route-openshift-image-registry.apps.test-cluster.priv/openfaas/faas-grafana:5.2.0-f4 --tls-verify=false
# Deploy a pod for Grafana
oc -n openfaas run --image=image-registry.openshift-image-registry.svc:5000/openfaas/faas-grafana:5.2.0-f4 --port=3000 grafana
oc -n openfaas expose pod grafana --type=NodePort --name=grafana
oc expose svc grafana
oc -n openfaas get routes grafana -o jsonpath='{.spec.host}'
Browse to the Grafana interactive visualization web application at the route printed above http://grafana-openfaas.apps.test-cluster.priv
The default password is admin:admin that will be changed on first login. We can view the OpenFaaS dashboard that shows the same data that we can get from Prometheus, but in a more user-friendly way.
If the Horizontal Pod Autoscaler (HPA) fails in getting CPU consumption and reports ‘unknown for current cpu usage : "the HPA was unable to compute the replica count"’ in OCP 4 with apiVersion autoscaling/v1, you could use the autoscaling/v2beta2. This failure happens if you have not specified the limits and resources in the yaml file used to create the function with either the OpenFaaS or Function Custom Resource. Let’s delete the previous hpa and create a new one using the hpa-pi-ppc64le.yaml
# delete the previous hpa for cpu resource
oc delete hpa/pi-ppc64le -n openfaas-fn
Create a HPAv2 rule for CPU and Memory
The hpa-pi-ppc64le.yaml shows the rule with CPU and Memory resources. The averageUtilization of 30 means 30 percent of one core with “Type: Utilization”. This is equivalent to averageValue of 300m with “Type: AverageValue”. Apply the new rule and generate load using hey. You will see the autoscaling scale up the number of function pods.
oc apply -f hpa-pi-ppc64le.yaml
watch "oc describe hpa.v2beta2.autoscaling/hpa-pi-ppc64le -n openfaas-fn"
hey -z=10m -c 20 -t 600 -m POST -D /tmp/test http://gateway-external-openfaas.apps.test-cluster.priv/function/pi-ppc64le -H "Content-Type: text/plain"
hpa-pi-ppc64le.yaml
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-pi-ppc64le
namespace: openfaas-fn
spec:
maxReplicas: 20
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: pi-ppc64le
metrics:
- type: Resource
resource:
name: cpu
target:
#type: Utilization
#averageUtilization: 30
type: AverageValue
averageValue: 300m
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 100Mi
Output from hey shows the response times for the requests to the pi-ppc64le function. At the start, only two pods were present for this function causing the initial surge of requests to be serviced by these pods. Later new pods get created because of scaling up. Since simultaneous requests are submitted to the same two initial pods with limit for cpu: "500m", they will take longer time to respond. Only new requests will go to the scaled-up pods. Therefore, the scaled-up pods may remain idle until the previous requests from the initial pods are completed and the gateway submits new requests from hey to the new pods.
Summary:
Total: 677.4370 secs
Slowest: 216.6568 secs
Fastest: 21.7998 secs
Average: 68.5332 secs
Requests/sec: 0.1388
Total data: 190068 bytes
Size/request: 2022 bytes
Response time histogram:
21.800 [1] |■
41.286 [39] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
60.771 [19] |■■■■■■■■■■■■■■■■■■■
80.257 [12] |■■■■■■■■■■■■
99.743 [9] |■■■■■■■■■
119.228 [0] |
138.714 [0] |
158.200 [0] |
177.685 [4] |■■■■
197.171 [3] |■■■
216.657 [7] |■■■■■■■
Latency distribution:
10% in 23.7644 secs
25% in 28.1016 secs
50% in 50.6900 secs
75% in 80.4898 secs
90% in 191.6926 secs
95% in 211.1526 secs
0% in 0.0000 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0006 secs, 21.7998 secs, 216.6568 secs
DNS-lookup: 0.0002 secs, 0.0000 secs, 0.0022 secs
req write: 0.0000 secs, 0.0000 secs, 0.0002 secs
resp wait: 68.5316 secs, 21.7993 secs, 216.6563 secs
resp read: 0.0007 secs, 0.0001 secs, 0.0072 secs
Status code distribution:
[200] 94 responses
When a Pod replica needs to be added, the OpenShift scheduler selects a node for the Pod to run on. Each node has a maximum capacity for each of the resource types: the amount of CPU and memory it can provide for Pods. The scheduler ensures that, for each resource type, the sum of the resource requests of the scheduled Containers is less than the capacity of the node. Even though the actual memory or CPU resource usage on nodes may be low, the scheduler still refuses to place a Pod on a node if the capacity check fails. This protects against a resource shortage on a node when resource usage later increases.
The HPA describe command may switch between the two forms with and without “m” for Metrics “resource memory on pods” and “resource cpu on pods”. Each replica for the pi-ppc64le can consume one full core for each function that it services. Each replica can service multiple function requests simultaneously. If your nodes have enough cores, you may specify greater than 100 as the limit for the cores if you want to allow pods that take full advantage of cores and run multiple function requests simultaneously.
When automatic scale down occurs, your in-flight requests may get killed if being serviced by the pods that are terminated. The client needs to check the return code and disconnected requests. In both these cases, the client needs to resubmit the requests. If you want to disable scale down to prevent in flight requests from being killed, you could change the behavior by increasing the stabilizationWindow or disable scaleDown policy as follows:
spec:
#behavior:
# scaleDown:
# stabilizationWindowSeconds: 1800 # Half hour
behavior:
scaleDown:
selectPolicy: Disabled
The downscale stabilization window can be set on a per-HPA basis by setting the behavior.scaleDown.stabilizationWindowSeconds field in the v2beta2 API. Alternatively, the global HPA settings exposed as flag --horizontal-pod-autoscaler-downscale-stabilization for the kube-controller-manager component cooldown delay can be set to avoid thrashing. The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
Same situation will happen when the node is low on memory. The OpenShift eviction policy stops pods as failed. The Function pods are scheduled in a different node since they are managed by a ReplicaSet specified by the minReplicas and maxReplicas.
If you have multiple requests being served by each pod, the memory required by the pod increases. If the memory limit set in the deployment is small, it can cause the pod to die. The system may terminate the container as it tried to use more memory than its limit. You can set the max_inflight environment variable to the maximum simultaneous requests that should be served. In this case, when additional requests are submitted to the pod, it returns “Concurrent request limit exceeded” or “curl: (55) Send failure: Broken pipe”. The client submitting the request needs to handle these responses and resubmit the request after a timeout.
Function execution timeouts that we set in the environment, prevent the function from executing too long. You will see the message that will kill the function execution if there were few pods to service the initial surge of requests. The pod will be restarted. To handle this, a new request will need to be resubmitted from the client.
2021/07/10 10:10:10 Function was killed by ExecTimeout: 10m0s
2021/07/10 10:10:10 Took 600.007541 secs
2021/07/10 10:10:10 signal: killed
We will create the function again, this time with the max_inflight: "2". This will prevent each pod function instance from servicing more than 2 requests simultaneously.
# Custom Resource pi-ppc64le-function.yaml
apiVersion: openfaas.com/v1
kind: Function
metadata:
name: pi-ppc64le
namespace: openfaas-fn
spec:
name: pi-ppc64le
image: karve/pi-ppc64le:latest
labels:
com.openfaas.scale.factor: "0"
environment:
write_debug: "true"
read_timeout: "600s"
write_timeout: "600s"
exec_timeout: "600s"
max_inflight: "2"
limits:
cpu: "500m"
memory: "500Mi"
requests:
cpu: "100m"
memory: "60Mi"
A sample python client to handle the above errors is shown in pi_invoke_function.py to simultaneously submit upto 64 requests. It generates numbers in range(1800,2200,3) and submits these to the pi-ppc64le function. If it finds “Concurrent request limit exceeded” or “Killed” or curl’s returncode!=0, the request is resubmitted after a timeout. It will retry upto a retrycount of 10 times with different sleep time depending on the error code before giving up. The stdout and stderr outputs from each of the requests is saved to /tmp.
pi_invoke_function.py
from joblib import Parallel, delayed
import time
import os
import subprocess
import sys
def process(index,total,value):
value=str(value)
print("Processing",index,"/",total,value, flush=True)
retrycount=0
while True:
# create two files to hold the output and errors, respectively
with open("/tmp/"+value+'.out','w+') as fout:
with open("/tmp/"+value+'.err','w+') as ferr:
out = subprocess.Popen(cmd,shell=True,stdout=fout,stderr=ferr)
out.wait(1200) # Wait upto 20 minutes for each request
fout.seek(0)
output=fout.read()
ferr.seek(0)
errors = ferr.read()
if out.returncode!=0:
if retrycount>=10:
print("Giving Up returncode",out.returncode,"retrycount",retrycount,value)
return "*"+value
retrycount=retrycount+1
print("Sleeping 120s because returncode",out.returncode,"retrycount",retrycount,value)
print("output",output)
print("errors",errors)
time.sleep(120.1)
continue
if output.find("Concurrent request limit exceeded.")>=0:
if retrycount>=10:
print("Giving Up Concurrent request limit exceeded. retrycount",retrycount,value)
return "*"+value
retrycount=retrycount+1
print("Sleeping 150s because Concurrent request limit exceeded. retrycount",retrycount,value)
time.sleep(150.1)
continue
if output.find("Killed")>=0 or errors.find("Killed")>=0:
if retrycount>=10:
print("Giving Up Killed. retrycount",retrycount,value)
return "*"+value
retrycount=retrycount+1
print("Sleeping 60s because Killed. retrycount",retrycount,value)
time.sleep(60.1)
continue
print("Processed",index,"/",total,value, flush=True)
return value
values=[i for i in range(1800,2200,3)]
total=len(values)
results = Parallel(n_jobs=64, prefer="threads")(delayed(process)(index,total,value) for index,value in enumerate(values))
print(results, flush=True
You can delete the hpa when you are done as follows:
oc delete hpa hpa-pi-ppc64le -n openfaas-fn
Additional examples are available for running functions on ppc64le in Node.js, Python, Linux command line utility. Most sample functions use Dockerfiles that have base images available for ppc64le. You should now be able to create your own templates, build and run functions on OpenShift ppc64le. It can be really easy to start developing serverless applications.
Conclusion
We looked at building images for functions that run on OpenShift 4.x for ppc64le. We looked at deploying functions in OpenFaaS using a Function Custom Resource. We also looked at using the cpu-based and memory-based Openshift HPA for autoscaling with long running functions. Knowing how to monitor resource usage in your functions and setting timeouts is of vital importance. This will allow you to discover different issues that can affect the health of the applications running in the cluster and handle problems due to CPU starvation and memory overcommit. A pod without limits is free to use all the resources in the node. You have to properly configure the limits. Monitoring the resources and how they are related to the limits and requests will help you set reasonable values and avoid OOM kills as well as allow fair sharing of resources. In Part 3, we will cover asynchronous function execution and function chaining with OpenFaaS on RedHat OpenShift for ppc64le.
Hope you have enjoyed the article. Share your thoughts in the comments or engage in the conversation with me on Twitter @aakarve. I look forward to hearing about how you use OpenFaaS with Autoscaling, what kind of problems timeouts have caused and if you would like to see something covered in more detail.
References
- Self-paced workshop for OpenFaaS https://github.com/openfaas/workshop/blob/master/README.md
- Manage functions with Kubelet https://www.openfaas.com/blog/manage-functions-with-kubectl/
- Metrics HPAv2 with OpenFaaS https://docs.openfaas.com/tutorials/kubernetes-hpa/
- Custom Alert for 429 https://github.com/openfaas/nats-queue-worker/issues/105#issuecomment-787494533
- Serverless Computing on Constrained Edge Devices https://helda.helsinki.fi/bitstream/handle/10138/314280/Tilles_Jan_Pro_gradu_2020.pdf?sequence=3&isAllowed=y
- Create ASCII Text Banners https://www.tecmint.com/create-ascii-text-banners-in-linux-terminal/