Dear Ludwin
Do you use PEX only in your test but not in your production environment? Starting PEX incurs run-time performance overhead on selected running jobs and this can explain why you do not see much CU % Busy of the job in test environment. Can you run your program in the test environment with PEX turned off and see if you notice higher run-time CPU % Busy?
You earlier mentioned using SQL in your program.
- Do most or all SQL statements that you use contain WHERE, ORDER BY, GROUP BY clauses and also use join operation? The existence of indexes containing columns use by ALL these operations improve run-time performance (and reduce CPU consumption) of such SQL statements, especially for the case of WHERE clause that contains substantial calculation within (as opposed to simple comparison). I'm aware of logical files you said you created but they reduced CPU % Busy by not too much but the question is whether you created a proper amount of indexes for those files or not? One best way to answer this is to use DB2i Plan Cache snapshot to check the access plans and index advisor of the jobs that you run these SQL. If you do not know how to use Plan Cache snapshot, I have some articles that may give you an ides on how to use it.
BTW, tables that are smaller than 100MB in size do not need indexes as the performance benefit of indexes is mostly not significant. So, focus on tables larger than 100MB.
- Do data files in test environment (accessed by your SQL statements) contain the same or almost the same amount of data rows as those in production environment? SQL without sufficient number of indexes scans tables a lot and files with small amount of data rows consume less CPU because less rows are scanned.
- Do you enable DB2 SMP in your production environment or not? DB2 SMP run multiple (as opposed to just a single task) SQL access tasks and they collectively consume more CPU power per job for the same statement accessing the same set of data. To check this, you first check if SW product named DB2 Symmetric Mutiprocessing is installed or not. If so, check if the system value QQRYDEGREE is set to a value of *OPTIMIZED or *MAXIMUM or not. If yes for both checks, this explains high CPU % in your production environment (I'm assuming your test environment doe not enable DB2 SMP).
While SQL is running, you can run WRKACTJOB and press F11 until you see the column THREAD. If your job (I'm assuming it is a single-threaded job) displays more than a value of 1 for Thread column, that means SMP is active for your job. You can also see SMP in action from Visual Explain of your SQL job.
- Another possible, but rare, explanation (outside of SQL area) why your test environment does not consume as much CPU power (other than PEX overhead) can be that the LPAR is configure with too many more Virtual Processor than the allocate CPU cores as compared to your production LPAR (I'm assuming your production and test machines are of the same Power server model). You can check this Virtual Processor assignment with WRKSYSACT command.
------------------------------
Right action is better than knowledge; but in order to do what is right, we must know what is right.
-- Charlemagne
Satid Singkorapoom
------------------------------
Original Message:
Sent: Mon October 24, 2022 06:55 PM
From: Ludwin Pérez
Subject: Performance Issues with QCCA programs
Hi Satid,
Do you use the same encryption keys for both test and production LPARs?
Not exactly the same, but the keys are created the same way in the test and production machines. The behavior its the same in both, high CPU usage by the CSNBKRD service program. When the call to the program CSNBKRD is deleted/commented the CPU Usage falls dramatically, although the program QC6SRV still uses relatively high CPU.
Does your program manipulate the same amount of data in your test environment as it does in production environment?
I could say Yes, the test I perform on the test machine its based on a test transaction. In production I didn't have to test something, just the retrieve of performance data to show me what programs were using the most CPU, just like what I did in the test machine. But to general proposes, it was the same kind of transaction.
I used the STRPEX command to retrieve the performance data on both machines (*STATS).
Next please also ensure your 4767 card is active.
Yes, It's active on both machines, test and production.
With the other questions...
I tested while the users weren't doing their work. Just to ensure that the only transactions in the system were the transactions I'm testing.
With the graphs I have to request support because I don't have access to a certain options, I'm not the IBM i Administrator, just a programmer :(. I hope to have the support next week.
As long as these programs do not execute the exact same operations, I do not see it is sensible to make such a conclusion.
I didn't understand. The programs I mentioned basically does the same operations of delete records from the DES Key File (QAC6KEYST) time over time. Of course, between these deletes are other operations that use the cryptographic card.
Thanks again for your reply
------------------------------
Ludwin Pérez
Original Message:
Sent: Wed October 19, 2022 08:07 PM
From: Satid Singkorapoom
Subject: Performance Issues with QCCA programs
Dear Ludwin
>>>> with the test I perform I don't see any performance problem individually with these operations. <<<<
Do you use the same encryption keys for both test and production LPARs? If you use much shorter key on test LPAR, this may at least partially explain the performance difference.
Does your program manipulate the same amount of data in your test environment as it does in production environment? If much less data are manipulated in your test environment, it may also partially explain the performance difference.
Next please also ensure your 4767 card is active. If not, that explains the high CPU you mentioned.

If your 4767 card is active, do you perform the test in a separate LPAR from the production workload LPAR that you experience the performance issue? Are you generally the only workload in the test LPAR when you do the test? Is the test LPAR in the same machine as the production LPAR and use similar disk HW? If yes for all, are you aware of the general workload in your production LPAR when you experience the performance issue? Since encryption is CPU-bound workload, I suspect that your production LPAR may carry too much workload or has certain wait component(s) that interfere with CPU performance (e.g. CPU queuing and such).
I suggest you start with comparing the PDI graphs named Wait Overview and Wait for Generic Job or Task for both LPAR and note the difference in the appearance of wait components you see. If the majority of data you use DES with are on disk, you should also compare the general disk response time as well. See my article 3 and 5 to get an idea on what these PDI graphs look like and how to analyze them : https://www.itjungle.com/author/satid-singkorapoom/
>>>> there is another programs that seems to have high CPU usage like QSQRUN3, QSQROUTE, which reinforces my idea that it's not a hardware issue <<<<
As long as these programs do not execute the exact same operations, I do not see it is sensible to make such a conclusion. In my experience for a production environment, HW performance issue is best concluded from analyzing IBM i performance data and PDI graphs are the best tool to use which is why I made suggestion above.
------------------------------
Right action is better than knowledge; but in order to do what is right, we must know what is right.
-- Charlemagne
Satid Singkorapoom
Original Message:
Sent: Wed October 19, 2022 02:44 PM
From: Ludwin Pérez
Subject: Performance Issues with QCCA programs
Thanks for your reply.
Answering the questions:
I wonder if you could provide more contextual information on what operation(s) your program is used.
Well, the application generally works with some functions like Decipher, Encipher, Key Imports, CVV-CVV2 -iCVV Generate, Clear PIN Generate (PVV/Offset), etc. With those operations I don't see any performance problem, at least individually, maybe in general they require high CPU usage, especially from the program QC6SRV I mentioned.
The problem is notable especially when records from the Key Store QAC6KEYST are deleted using the CSNBKRD comand/program from the QCCA library.
For example, the program QC6SRV still has high CPU usage even when the CSNBKRD call is deleted/commented (but still that test give me like 60%-80% CPU usage reduction!!). With CSNBKRD being executed/called, besides the QC6SRV program, there is another programs that seems to have high CPU usage like QSQRUN3, QSQROUTE, which reinforces my idea that it's not a hardware issue or that the cryptographic hardware it's being overloaded, but a software issue with the SQL internally used on the QCCA library programs (something like bad optimization, close of all cursors at the end of the program/module, for example). Our application doesn't use SQL in any process, so only the QCCA programs remains.
What kind of encryption are you using?
The application works only with DES. Never uses PKA, AES, or RSA
Does it involve encrypting a lot of or only a few data in one operation?
It depends, but again, with the test I perform I don't see any performance problem individually with these operations.
Do you use it in a coprocessor or accelerator mode?
Sorry, I'm not sure what are these modes. I don't remember reading something about those before.
Have you ensured 4767 has its latest firmware level?
It has the 5.6.9 version, so I think that yes.
------------------------------
Ludwin Pérez
Original Message:
Sent: Tue October 18, 2022 09:16 PM
From: Satid Singkorapoom
Subject: Performance Issues with QCCA programs
Dear Ludwin
I wonder if you could provide more contextual information on what operation(s) your program is used. What kind of encryption are you using? Does it involve encrypting a lot of or only a few data in one operation? Do you use it in a coprocessor or accelerator mode? Have you ensured 4767 has its latest firmware level?
Do you have this redbook: https://redbooks.ibm.com/abstracts/sg247399.html ? I search this redbook with "performance" and find a number of information on this. For example, it says using asymmetric key (PKA) is slow and a case of using it judiciously is mentioned. It also says using large encryption key size is slow and some considerations are discussed. Although this redbook was published in 2008, the performance part may still be valid to some extent.
A few articles that touch on IBM i encryption performance :
https://info.townsendsecurity.com/encryption-performance-on-the-ibm-i
https://www.precisely.com/blog/data-security/encryption-on-the-ibm-i-pitfalls
https://www.ibm.com/security/cryptocards/pciecc2/performance
------------------------------
Right action is better than knowledge; but in order to do what is right, we must know what is right.
-- Charlemagne
Satid Singkorapoom
Original Message:
Sent: Tue October 18, 2022 12:03 PM
From: Ludwin Pérez
Subject: Performance Issues with QCCA programs
Hello everyone,
I post this question here because I think it's not a Hardware Issue, but a Software one.
I'm having performance issues with certain programs from the QCCA library, especially with the CSNBKRD program. It seems like the problem it's not in the program itself but in the program QC6SRV and/or the utilization of internal SQL code to delete records (and to access record in general) from the keyfile (DES keystore in my case). My application runs on a Power 9 and uses a 4767 Cryptographic Coprocessor.
The precise measures I took to improve the performance were
- The creation of a logic file of the DES keystore file (QAC6KEYST) witch was a suggestion from de debug annotations. It had little impact (like 5% of CPU use reduction)
- The execution of OVRDBF at the beginning of the jobs that runs the transactions, changing some parameters of the DES keystore file (QAC6KEYST) to avoid the elimination of the ODP (Open Data Paths) that appeared in the debug annotations. Each time the program was called, it created an ODP and then it was deleted, the same happens with cursors. It had an impact of 10-15% CPU reduction. With cursors I don't know if there is a way to close them at the end of the job and not at the end of the program/module.
- Correct some programs that didn't ended with Return and didn't use activation groups properly. Actually this change didn't improve performance at all, i thought that with this upgrade the CPU use reduction was going to be huge, but no.
- I created a program that directly deletes the record from the DES keystore file (QAC6KEYST). It had huge impact like (50% CPU reduction), but I'm not to sure if this is appropriate because I don't know if the program CSNBKRD does another vital operations inside. With this change, the program QC6SRV still used a lot of CPU.
¿Do you have experience with something similar? ¿Any idea of what I can do to solve this? Maybe I'm omitting some considerations...
Thanks in anticipation to your help.
Regards
------------------------------
Ludwin Pérez
------------------------------