App Connect

App Connect

Join this online user group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Problem Determination guidance for ACE in containers

By AMAR SHAH posted Thu March 16, 2023 08:39 AM

  

This article provides general guidelines for problem determination of ACE running in containers.  This is applicable to ACE running in CP4I or any supported K8S environment.

We will take some of the commonly reported symptoms and provide potential reasons for it. We then describe various things user can take a look at to determine the cause of it. We finally provide possible actions user can take to mitigate the situation or to perform advance diagnostics.

We will discuss following symptoms:

  • Low message rate/throughput
  • High CPU usage by Integration Server pods
  • High Memory usage by IS pods or  pod restart due to OOMKilled Errors 
  • Slow startup of IS pod/container

Symptom Potential  Reasons How to Determine the cause? Possible Actions or remediation
Low Message Throughput Inadequate allocation of CPU limits to the pod Openshift metrics,  pod metrics Check the pod metrics for CPU utilization and if it is near upper limit, consider allocating more CPU core to the pod
Poorly designed messageflows Collect Message flow Acc & Stats and look for potential hotspots from flow and node stats refer to KC for code design tips -https://www.ibm.com/docs/en/app-connect/12.0?topic=performance-code-design
Large no. of message flows deployed to a single pod By looking at the no. of message flows  It is recommended to deploy small no. of message flow to a container , typically 1 application or the related applications
Inadequate no. of replicas of pod See if  container CPU usage is getting closer to its limits Increase the no. of replicas of the pod for horizontal scaling 
Insufficient JVM heap size tuning for java based workload Observe JVM Resource stats to see if there is excessive Garbage Collection or heap usage reaching JVMMaxHeapSize Increase the JVM maxHeapSize if you find that Garbage Collection is getting kicked in too often
Inadequate no. of additional instances on BAR/MessageFlow Collect Message flow Acc & Stats and check for Times max instances are hit  Tune the additional instances in combination with other parameters described above
   

 

--------------

Symptom Potential  Reasons How to determine the cause? Possible Actions or Remediation
High CPU Large no. of message flows deployed to a single pod Openshift metrics,  pod metrics Allocate more CPU to ACE containers.  IS CR values to be tuned:
spec.pod.containers.runtime.resources.limits.cpu
spec.pod.containers.runtime.resources.requests.cpu
Poorly designed messageflows Collect Acc & Stats (flow stats and node stats) and check for CPU metrics like Avg CPU Time, Total CPUTime a) Ensure the message flow conform to coding best practices. https://www.ibm.com/docs/en/app-connect/12.0?topic=performance-code-design
b) Identify the flows/nodes hotspots from Acc & Stats data and do deep dive into it to check for coding practices or size of messages etc..

----------

Symptom Potential  Reasons How to determine the cause? Possible Actions or Remediation
High memory or OOM Large no. of message flows deployed to a single pod By observing the no. of BAR files and Message Flows deployed to the pod Allocate more memory to ACE containers.  IS CR values  to be tuned:
spec.pod.containers.runtime.resources.limits.memory
spec.pod.containers.runtime.resources.requests.memory
Poorly designed messageflows By observing the overall size of messages and complexity of message flow Ensure the message flow conform to coding best practices. https://www.ibm.com/docs/en/app-connect/12.0?topic=performance-code-design
native memory leak Memory usage of pod continues to grow over a period of time for the same workload Try isolating the problem to a specific message flow or application. Try reproducing problem independently in an on-prem environment if available and perform memory leak mustgather. Or Contact IBM Support.
java memory leak Check if there are any Java OOM errors,  javacores  Increase the Java MaxHeapSize by 25-50% and see if it stops Java OOM errors. Heapsize might need to be tuned in Multiple iterations.

------------

Symptom Potential  Reasons How to Determine the cause? Possible Actions or remediation
Slow Startup of containers Inadequate allocation of CPU limits to the pod pod restart due to liveness/readiness check failure Increase the CPU request/limits for the container
Large no. of message flows or BARs deployed to a single pod Check no. of BAR files allocated to the IS or no. of message flows in a single BAR. Consider spliting the flows into multiple integration servers
un-optimized server (i.e. ibmint optimize server not run typically in a custom image) By observing the pod console log The  ACEcc images by default run ibmint optimize server but if the user builds their own image, ensure they run ibmint optimize server just before IS startup

0 comments
64 views

Permalink