Original Message:
Sent: Tue October 01, 2024 04:12 PM
From: Chris Sloan
Subject: Custom resource template for APIC management cluster on CP4I
Hi Abu,
In the DataPower operator, we added the ability to set separate request and limits in DP Operator 1.6.14 and DP Operator 1.11.0, which have already released. However there also needed to be a change in the APIC Operator to support this. This APIC support was added in the operator that ships with 10.0.5.8 and will be included in the operator that ships with 10.0.8.1.
I think the only missing piece for you here is the availability of APIC 10.0.8.1, once that is released you will be able to leverage this functionality. At that point you will be able to set your request and limit independently (as long as limit > request).
One thing I would call out, and I think you already know and understand this, is that overcommitting CPUs can definitely impact performance. This is maybe less of a concern in a dev or qa environment, but might not be the preferred approach for your production gateways. Setting the request to 1 means it could be scheduled on a node that only has 1 cpu "free", even though the limit is 2. There could be some situations where we can never actually reach the limit if the node we are on is CPU starved, even though we are technically allowed to use up to 2.
------------------------------
Chris Sloan
DataPower Development
Original Message:
Sent: Tue October 01, 2024 08:24 AM
From: Abu Davis
Subject: Custom resource template for APIC management cluster on CP4I
@Chris Dudley That would be great if you could check on that. The fix coming later this year, AFAIK will not fix the issue with the CPU Request being forced to be the same as the set CPU Limit, it will only fix the ability to set a CPU Limit to a value greater than or equal to 1 vCPU, so if I set to 2, then CPU Request becomes 2. Interestingly, if you don't set a CPU Limit, it will be set to the value in the CPU Request, PMR has confirmed that this is "working as designed", which seems very odd to me.
Consider this timeline:
- When we were on 10.0.5.7, profile was n1xc7.m48, Gateway replica is 3, we would then use the template override to set the CPU Limit to 2 which auto-sets the CPU Request to 2 vCPU.
- Currently we are on 10.0.8.0 (upgraded from 10.0.5.7), profile is n1xc7.m48, Gateway replica is 3, CPU Request and Limit is 1 vCPU as template override functionality is gone.
- In Q4 2024, we will be on 10.0.8.1, profile will be n1xc7.m48, Gateway replica will be 3, we would then use the template override to set the CPU Limit to 2 which auto-sets the CPU Request to 2 vCPU.
- What we want! :D to be on 10.0.8.x, profile will be n1xc7.m48, Gateway replica will be 3, we would then use the template override to set the CPU Limit to 2 which auto-sets the CPU Request to 1 vCPU regardless of the value of CPU Limit, k8s rules still apply ofcourse such as Limit cant be lower than Request :) This would then be in tandem with what is described in the IBM Software Compatibility Report - Supported Gateway minimum = 1 vCPU.
------------------------------
Abu Davis
Original Message:
Sent: Tue October 01, 2024 08:04 AM
From: Chris Dudley
Subject: Custom resource template for APIC management cluster on CP4I
Hiya,
No, I am not sure why the gateway has the same values for requests as it does for limits. Its the only pod in APIC which does. I will see if I can find out the answer though. There definitely is a 1cpu gateway profile option - but I am not sure whether you can then have that as HA n3 or not.
------------------------------
Chris Dudley
Original Message:
Sent: Tue October 01, 2024 03:41 AM
From: Abu Davis
Subject: Custom resource template for APIC management cluster on CP4I
Thank you for contributing further on this :)
But I just wanted to clarify that the resource template for adjusting the CPU request for gateway cluster CR was possible in 10.0.5 and was removed in 10.0.8.0, due to our concerns over a PMR, they have promised us they'll restore this functionality in DP operator v1.6.14 which is due in Q4 2024, so all hope isn't lost I hope ;)
@Chris Dudley: Do you know why IBM is enforcing the CPU Request to the same value when a CPU Limit is set such as 4 vCPU for the gateway CR on CP4I? This not only has license implications for us, but more importantly it reserves 4 vCPUs on the worker node reducing its capacity, so for customers who run on smaller workers such as 8 vCPUs, of which 7.5 vCPU is allocatable on Openshift, 4 is consumed by the Gateway pod, reducing the overall available worker node capacity. Additionally, if they have replicas, then thats 4 x 3 replicas = 12 vCPUs, forcing the customers to pay for a bigger worker node(s). So when we had 10.0.5 we addressed this problem by setting the CPU Request to 2 vCPUs using the template override. AFAIK, the IBM recommended minimum for the gateway is 1 vCPU (software compatibility report), so I guess that translates to a CPU Request of 1 which is ok by me. So please reconsider and set the CPU Request to (say) 1 vCPU and let the customer adjust the CPU limit according to how they have sized the workers for API workload. Thoughts? :)
------------------------------
Abu Davis
Original Message:
Sent: Tue October 01, 2024 01:32 AM
From: BEC API Management team
Subject: Custom resource template for APIC management cluster on CP4I
Hi again, @Chris Dudley
Now, THIS is a statement I can get behind :)
Template overrides are very powerful, but there is zero verification or validation - you are bypassing all APIC code here and are telling kubernetes directly what you want to happen. If you tell it to do something stupid then it will very happily obey you and unleash chaos.
As spiderman says: with great power comes great responsibility. ;-)
There are some validations in this in the operator - some fields are sadly being ignored (but that is another - and looong talk, he he).. And IBM could remove this feature all together from the operator custom resource reading if you/they wanted.
However - I agree with your initial statement on it being "dangerous" in regards to stability. If you do not know what you are doing, he he..
However - for large clusters where IBM API Connect are coexisting with hundreds of other applications, the "stability" of IBM API Connect will not rely on for example the resource requests of individual containers but rather how the cluster is utilized, what other application containers are running on the same nodes and their behavior.
I would just say that we are more than happy that IBM has allowed for this template overwriting feature. Without it - we would have a hard time running the platform on our sandbox and test clusters - and already have, since it is not supported in ALL custom resources - hint/hint :). Simply due to the fact thta request values are too large compared to actual usage. In 10.0.8 it is much better - but for sandbox/test we lower the request values for quite a few of the containers - simple to be able to exist on the cluster. The limit values is another discussion.. Depending on the use - we raise it for a few selected components (actually providing increases stability on the low resource profiles).
We are hoping that your statement on "we do not support ......", mean "production" and does not mean that IBM decides to remove this ability on the rest of the components - already done in the gatewaycluster cr (which makes our daily life quite hard on sandbox/test clusters, he he).
In any case - sorry for highjacking Abu's good thread here - no need to response :)
------------------------------
BEC API Management team
Original Message:
Sent: Mon September 30, 2024 04:49 AM
From: Chris Dudley
Subject: Custom resource template for APIC management cluster on CP4I
Template overrides are a kubernetes concept - its not something we invented.
However, the power and flexibility that comes with them also comes with a fair degree of risk - you are obviously changing things from what we have tested, and so if it all goes wrong then expect to be told by Support to remove whatever overrides you added.
Template overrides can be used to use custom images (as used with ifixes), set environment variables, change the replica count or change resources.
Do not assume that we will support everything you can do with them though.
e.g. changing the replica count should not be done unless it is mentioned in the documentation or you have been advised that it acceptable. There are all sorts of considerations with horizontal scaling and only certain parts of APIC support it (the gateway and analytics).
You edit the CR (CustomResource) for the subsystem you want to customise - e.g. GatewayCluster for the gateway and ManagementCluster for the api manager.
All overrides of custom images must be removed when applying a new fixpack or upgrading to a new release - this is for the simple reason that you don't want to be trying to run the new fixpack level with the backlevel ifix pod, as that definitely will not be supported and would guarantee to cause problems.
The same can also apply to resource level overrides - we sometimes tweak the resources to different pods in fixpacks and releases - but if you're using overrides then your settings will be used not ours and so can lead to unforeseen problems.
Template overrides are very powerful, but there is zero verification or validation - you are bypassing all APIC code here and are telling kubernetes directly what you want to happen. If you tell it to do something stupid then it will very happily obey you and unleash chaos.
As spiderman says: with great power comes great responsibility. ;-)
------------------------------
Chris Dudley
Original Message:
Sent: Mon September 30, 2024 04:33 AM
From: Abu Davis
Subject: Custom resource template for APIC management cluster on CP4I
@Chris Dudley Thank you for the answer.
I forgot to mention that we had recently upgraded our Production APIC from 10.0.5.7 to 10.0.8.0 (CP4I 2022.2.1 > 16.1.0) and had to remove the template: override we were using to set the CPU of the Gateway pods to 2 from the default 1 provided by the deployment profile. In another PMR, ability to customise the gateway CPU is considered as a feature which will be fixed in Q4 2024 in APIC when DP operator v1.6.14 is released. So meaning to say even if we remove the iFix (template override), does the template support customizing CPU for BOTH gateway and apim? Additionally, if there is a future iFix we would like to apply, could that also be accommodated in the same template: section?
------------------------------
Abu Davis
Original Message:
Sent: Mon September 30, 2024 04:18 AM
From: Chris Dudley
Subject: Custom resource template for APIC management cluster on CP4I
Don't worry about it.
You can definitely add the additional template override as well as that ifix. Its just a question of getting the YAML correct - which can definitely be a challenge (why can't everything just use JSON? ;-) )
I think it should be something like this - note the earlier example from BEC was the wrong container - it used 'apim' which would be the main API Manager backend pod, not the UI.
Note you will need to be careful when upgrading to 10.0.8.1 (once its released). You will need to remove the template override with the custom LAiFix image location when upgrading to 10.0.8.1, might be simplest to remove the resource overrides too, and then reinstate that part after upgrade if needed.
spec: template: - name: ui containers: - name: ui image: local.image.registry/path/to/image resources: requests: cpu: @REQUEST_CPU@ memory: @REQUEST_MEMORY@
------------------------------
Chris Dudley
Original Message:
Sent: Mon September 30, 2024 04:01 AM
From: Abu Davis
Subject: Custom resource template for APIC management cluster on CP4I
@Chris Dudley: I am truly sorry if they felt I insulted them. I have not in the remotest possibility even thought to insult anybody there or anywhere else. I have raised more than 200+ PMRs in the past few years and I believe I have a good working relation with them. I do admin that I was a bit pushy with this PMR as we have 3 PMRs raised for the issues relating to abrupt logout and slowness since about 2 months now and since the issue affects our Production APIC, we want this fixed at the earliest. I shall ensure that does not happen in the future. I have send my apology in the PMR ticket and have closed the PMR now as the solution is being investigated here.
Now coming to the solution provided, I am not sure if that will work as we already are using the template: section for the iFix we applied? https://ibm.biz/BdKmzW
------------------------------
Abu Davis
Original Message:
Sent: Mon September 30, 2024 01:47 AM
From: Chris Dudley
Subject: Custom resource template for APIC management cluster on CP4I
Aha I see an example was posted.
This is intentionally not in the docs and won't be added.
We support increasing resources not decreasing them.
There are multiple profiles to choose from and it's almost always better to switch profiles than play with overrides. The profiles have been tested and balanced, if you increase resources in one pod you likely move the bottleneck somewhere else and it continues being an issue.
We have seen customers get into trouble doing this, that's why it isn't documented. APIC is a complicated solution with 20+ pods, playing with the resources to individual parts should be done carefully.
Reminder: we will not support decreasing the resources from the allotted amount in that profile.
------------------------------
Chris Dudley
Original Message:
Sent: Mon September 30, 2024 01:41 AM
From: Chris Dudley
Subject: Custom resource template for APIC management cluster on CP4I
Hello,
I saw your PMR, I'd be careful insulting the support team when it's their help you need, if I were you.
It's not just hugely unprofessional but also downright rude and is not acceptable here or anywhere else.
You asked about increasing the resources but said you were limited by licensing, the support engineer thought you meant CPU not memory and increasing the CPU through overrides would increase the licensing cost just as changing the profile would. They were trying to avoid you having to pay more since you said that was a concern.
The CR isn't missing anything at all. It's perfectly possible to do template overrides for any pod within the CR, it's a standard kubernetes/openshift technique and not APIC specific.
I'll find an example for you and will add that here in due course, in the meantime, no matter how frustrated you are, let's be nice to the support team whose help you need please!
Cheers
Chris
------------------------------
Chris Dudley
Original Message:
Sent: Fri September 27, 2024 02:28 PM
From: Abu Davis
Subject: Custom resource template for APIC management cluster on CP4I
I tried to raise PMR but I hit a brick wall, the engineer is totally clueless! So I am really hoping someone can answer this!
We are running API Connect cluster v10.0.8.0 on CP4I 16.1.0 on Openshift 4.14 on Azure
The documentation is missing an option to configure custom parameters in the APIC CR for the management server pod "apic-mgmt-ui" under the " management:" section, please provide that to change the default memory allocated. Due to license restriction, we are not allowed to change the deployment profile (which is currently n1xc7.m48).
------------------------------
Abu Davis
------------------------------