Dear Jozef
Using DB2 SMP when there is substantial amount of CPU Queuing wait would not deliver much performance benefit. And if after a few more nights of batch run, the run-time does not improve back to what it was before the upgrade, then it may not be about missing autonomic indexes.
You should also look at Wait by Generic Job or Task chart and focus on jobs of the batch process to compare the proportion between Dispatched CPU Time VS waits to check if wait time is too much or not and if so what specific wait time is dominant for those jobs. Do not look at just Wait Overview alone as the overview can obscure some subtle details from our interpretation.
You did not mention about disk response time at all but this is also an important performance factor. Please compare disk response time during the batch run duration before VS after the upgrade to see if it degrades substantially during the batch run period or not. If so, then compare Physical Disk IO Overview before and after the upgrade to see if it increases substantially or not. If confirmed, compare this disk IO workload of batch jobs before and after the upgrade to confirm its substantial increase. If so, compare DB IO Per Second before and after the upgrade (read case 5 of my article). If it increases substantially, then it should finally means there is now more workload for batch processing to handle that explains the longer run-time.
If batch workload increase is confirmed, you can improve the run-time by improving disk response time and check for and rectify non-optimized object authority assignment for user profiles that run batch jobs that can negatively affect batch run-time (read case 1 of my article). Creating useful indexes according to System-wide Index Advisor can also help with SQL/Query portion of the batch workload.
Another thing to do is to review the job log of batch job. Choose a few longest running jobs to review just to check if you see any repetitive appearance of any message that looks out of place.
One last point that comes to my mind. Does your customer has an in-house application development team that keeps delivering program modifications/enhancements? If so, check if the AD team happens to deploy new codes at the same time as this upgrade? If so, the new codes may increase the workload by intention or if there is no such intention, the new codes may contain bad coding that unnecessarily increase CPU consumption.
In my past experience when it comes to customers' in-house AD, too many programmers do not know good coding from bad one in terms of performance implication, at least in ASEAN. But this may not be easy to prove as we may not be able to identify as such. In my past experience, there were very few cases that I could, by some sheer luck, identify bad codes that caused performance issue but this was sadly rare and I was reluctance to do this on my own.
Let me know the progress if you still cannot solve the issue.
------------------------------
Right action is better than knowledge; but in order to do what is right, we must know what is right.
-- Charlemagne
Satid Singkorapoom
------------------------------
Original Message:
Sent: Wed September 14, 2022 10:45 AM
From: Jozef Thijs
Subject: Batch performance degradation in OS 740 ?
Satid,
Thanks already for this feedback, and I will verify certainly verify every point you mention.
In fact, in OS 730, the batch night processing, completed well in the window 10 pm to 6 am, although the server was limited on CPU resources (less then 2 cores), and in the iDoctor overview time signature there was clearly a CPU queuing issue (only during the night) ... but the night processing completed still before 6 am.
Upgrading to OS 740, and still running with the same CPU resources & memory pools, the CPU queuing is still clearly visible, and even over a longer period, as the batch processing completes now around 10 am in the morning.
Currently, more CPU resources are assigned (increased to 2.7 and set to uncapped (to max 3 proc). So, at this moment, my batch process is running in an acceptable window again (completed before 6 am), ... but I can't explain to the customer this high CPU requirement.
So, at this moment, uncapped of 2.7, I see during the batch window, the 3 processor are full used. And looking at the iDoctor waits/signature time overview, the red 'dispatched CPU' is well presented. About 90% -95% is dispatched CPU, and other waits (very small) are related to 'disk page faults' and 'disk non fault reads'. So, at this moment, I don't see any direct indication !
Yesterday, I re-enabled as well parallel processing (as it was disabled through QAQQINI (set to *NONE) ... but during the past night, no real improvement seen, as the CPU utilization (also with 3 proc available) is very high.
Any further recommendation is welcome.
Thanks,
Jos (Jozef) Thijs
Kyndryl Belgium
------------------------------
Jozef Thijs
Original Message:
Sent: Tue September 13, 2022 08:05 PM
From: Satid Singkorapoom
Subject: Batch performance degradation in OS 740 ?
Dear Jozef
In my long experience helping many IBM i customers with this similar issue you are facing, I can say there can be too many possible reasons for explanation of batch runtime degradation. Unfortunately, these reasons can be identified only when the customers keep a sample of the performance profile of the entire batch process duration by capturing a number of keys PDI performance charts before the problem and compare them with ones at issue. For example, you should at least have the following charts on disk read/write response time, workload amount (DB IO per second), Wait Overview and Wait by Generic Job or Task, Physical Disk IO Overview and Physical Disk IO by Generic Job or Task, (Memory) Page Fault by Generic Job or Task.
If there is no such past PDI chart record keeping (do this for a few days known to be of peak/high workload), the best thing we can do is to identify the current cause(s) of the problem by looking at these PDI charts of the current day at issue and identify the current performance bottleneck. I hope you read all my 5 articles on which PDI charts are useful and how to analyze them here : https://www.itjungle.com/author/satid-singkorapoom/
If you need my help here, we can start by you capturing the PDI charts on Wait Overview and Wait by Generic Job or Task during the batch run duration and posting them here for me to see and I will try to analyze them for you. We may likely need to see more PDI charts after I see the Wait charts.
------------------------------
Right action is better than knowledge; but in order to do what is right, we must know what is right.
-- Charlemagne
Satid Singkorapoom
Original Message:
Sent: Mon September 12, 2022 10:47 AM
From: Jozef Thijs
Subject: Batch performance degradation in OS 740 ?
All,
A question about performance after upgrading from OS 730 to OS740. Since this upgrade the night batch process takes much more time ... and I'm still looking on system-level to the cause of this issue. Yes, I have to say there was - already in 730 - a CPU resource issue, running with 2 processor cores, which were fully used during the night (from 10 pm to 5 am), but now running in 740, I have enabled a third processor (via uncapped processing mode) to get the same batch-runtimes. (now also this third processor is fully used).
About the application ... it is an older ERP application ( a mix of native programming & SQL), where are a lot of indexes advised, and where the number of Full DB open operations is very high (in some pgms/jobs). So, here updates can be done.
But I'm looking for a more strong explanation to cover this additional CPU utilization in relation to the OS upgrade ?
Any information, ideas, recommendations ?
Already in contact with IBM support, but but not found or identified any cause ...
Thanks,
Jozef Thijs
Kyndryl Belgium
------------------------------
Jozef Thijs
------------------------------