IBM i Global

 View Only
  • 1.  Performance Issues with QCCA programs

    Posted Tue October 18, 2022 02:02 PM
    Hello everyone,

    I post this question here because I think it's not a Hardware Issue, but a Software one.

    I'm having performance issues with certain programs from the QCCA library, especially with the CSNBKRD program. It seems like the problem it's not in the program itself but in the program QC6SRV and/or the utilization of internal SQL code to delete records (and to access record in general) from the keyfile (DES keystore in my case). My application runs on a Power 9 and uses a 4767 Cryptographic Coprocessor.

    The precise measures I took to improve the performance were
    - The creation of a logic file of the DES keystore file (QAC6KEYST) witch was a suggestion from de debug annotations. It had little impact (like 5% of CPU use reduction)
    - The execution of OVRDBF at the beginning of the jobs that runs the transactions, changing some parameters of the DES keystore file (QAC6KEYST) to avoid the elimination of the ODP (Open Data Paths) that appeared in the debug annotations. Each time the program was called, it created an ODP and then it was deleted, the same happens with cursors. It had an impact of 10-15% CPU reduction. With cursors I don't know if there is a way to close them at the end of the job and not at the end of the program/module.
    - Correct some programs that didn't ended with Return and didn't use activation groups properly. Actually this change didn't improve performance at all, i thought that with this upgrade the CPU use reduction was going to be huge, but no.
    - I created a program that directly deletes the record from the DES keystore file (QAC6KEYST). It had huge impact like (50% CPU reduction), but I'm not to sure if this is appropriate because I don't know if the program CSNBKRD does another vital operations inside. With this change, the program QC6SRV still used a lot of CPU.

    ¿Do you have experience with something similar? ¿Any idea of what I can do to solve this? Maybe I'm omitting some considerations...

    Thanks in anticipation to your help.
    Regards

    ------------------------------
    Ludwin Pérez
    ------------------------------


  • 2.  RE: Performance Issues with QCCA programs

    IBM Champion
    Posted Tue October 18, 2022 09:17 PM
    Edited by Satid Singkorapoom Tue October 18, 2022 09:59 PM
    Dear Ludwin

    I wonder if you could provide more contextual information on what operation(s) your program is used. What kind of encryption are you using?  Does it involve encrypting a lot of or only a few data in one operation?   Do you use it in a coprocessor or accelerator mode?  Have you ensured 4767 has its latest firmware level? 

    Do you have this redbook: https://redbooks.ibm.com/abstracts/sg247399.html  ?  I search this redbook with "performance" and find a number of information on this. For example, it says using asymmetric key (PKA) is slow and a case of using it judiciously is mentioned. It also says using large encryption key size is slow and some considerations are discussed. Although this redbook was published in 2008, the performance part may still be valid to some extent.

    A few articles that touch on IBM i encryption performance :
    https://info.townsendsecurity.com/encryption-performance-on-the-ibm-i
    https://www.precisely.com/blog/data-security/encryption-on-the-ibm-i-pitfalls
    https://www.ibm.com/security/cryptocards/pciecc2/performance

    ------------------------------
    Right action is better than knowledge; but in order to do what is right, we must know what is right.
    -- Charlemagne

    Satid Singkorapoom
    ------------------------------



  • 3.  RE: Performance Issues with QCCA programs

    Posted Wed October 19, 2022 02:44 PM
    Thanks for your reply.

    Answering the questions:
    I wonder if you could provide more contextual information on what operation(s) your program is used.
    Well, the application generally works with some functions like Decipher, Encipher, Key Imports, CVV-CVV2 -iCVV Generate, Clear PIN Generate (PVV/Offset), etc. With those operations I don't see any performance problem, at least individually, maybe in general they require high CPU usage, especially from the program QC6SRV I mentioned.

    The problem is notable especially when records from the Key Store QAC6KEYST are deleted using the CSNBKRD comand/program from the QCCA library.
    For example, the program QC6SRV still has high CPU usage even when the CSNBKRD call is deleted/commented (but still that test give me like 60%-80% CPU usage reduction!!). With CSNBKRD being executed/called, besides the QC6SRV program, there is another programs that seems to have high CPU usage like QSQRUN3, QSQROUTE, which reinforces my idea that it's not a hardware issue or that the cryptographic hardware it's being overloaded, but a software issue with the SQL internally used on the QCCA library programs (something like bad optimization, close of all cursors at the end of the program/module, for example). Our application doesn't use SQL in any process, so only the QCCA programs remains. 


    What kind of encryption are you using?
    The application works only with DES. Never uses PKA, AES, or RSA

    Does it involve encrypting a lot of or only a few data in one operation?
    It depends, but again, with the test I perform I don't see any performance problem individually with these operations.

    Do you use it in a coprocessor or accelerator mode?
    Sorry, I'm not sure what are these modes. I don't remember reading something about those before.

    Have you ensured 4767 has its latest firmware level?
    It has the 5.6.9 version, so I think that yes.

    ------------------------------
    Ludwin Pérez
    ------------------------------



  • 4.  RE: Performance Issues with QCCA programs

    IBM Champion
    Posted Wed October 19, 2022 08:08 PM
    Edited by Satid Singkorapoom Thu October 20, 2022 03:53 AM
    Dear Ludwin

    >>>>  with the test I perform I don't see any performance problem individually with these operations. <<<<

    Do you use the same encryption keys for both test and production LPARs?  If you use much shorter key on test LPAR, this may at least partially explain the performance difference.

    Does your program manipulate the same amount of data in your test environment as it does in production environment?  If much less data are manipulated in your test environment, it may also partially explain the performance difference. 

    Next please also ensure your 4767 card is active. If not, that explains the high CPU you mentioned.

    Screen shot of Crypto GUI

    If your 4767 card is active, do you perform the test in a separate LPAR from the production workload LPAR that you experience the performance issue?  Are you generally the only workload in the test LPAR when you do the test?  Is the test LPAR in the same machine as the production LPAR and use similar disk HW?   If yes for all, are you aware of the general workload in your production LPAR when you experience the performance issue?  Since encryption is CPU-bound workload, I suspect that your production LPAR may carry too much workload or has certain wait component(s) that interfere with CPU performance (e.g. CPU queuing and such). 

    I suggest you start with comparing the PDI graphs named Wait Overview and Wait for Generic Job or Task for both LPAR and note the difference in the appearance of wait components you see. If the majority of data you use DES with are on disk, you should also compare the general disk response time as well.    See my article 3 and 5 to get an idea on what these PDI graphs look like and how to analyze them : https://www.itjungle.com/author/satid-singkorapoom/


    >>>>  there is another programs that seems to have high CPU usage like QSQRUN3, QSQROUTE, which reinforces my idea that it's not a hardware issue <<<<

    As long as these programs do not execute the exact same operations, I do not see it is sensible to make such a conclusion. In my experience for a production environment, HW performance issue is best concluded from analyzing IBM i performance data and PDI graphs are the best tool to use which is why I made suggestion above.

    ------------------------------
    Right action is better than knowledge; but in order to do what is right, we must know what is right.
    -- Charlemagne

    Satid Singkorapoom
    ------------------------------



  • 5.  RE: Performance Issues with QCCA programs

    Posted Mon October 24, 2022 06:55 PM
    Hi Satid, 

    Do you use the same encryption keys for both test and production LPARs?
    Not exactly the same, but the keys are created the same way in the test and production machines. The behavior its the same in both, high CPU usage by the CSNBKRD service program. When the call to the program CSNBKRD is deleted/commented the CPU Usage falls dramatically, although the program QC6SRV still uses relatively high CPU. 

    Does your program manipulate the same amount of data in your test environment as it does in production environment?
    I could say Yes, the test I perform on the test machine its based on a test transaction. In production I didn't have to test something, just the retrieve of performance data to show me what programs were using the most CPU, just like what I did in the test machine. But to general proposes, it was the same kind of transaction.

    I used the STRPEX command to retrieve the performance data on both machines (*STATS).  

    Next please also ensure your 4767 card is active.
    Yes, It's active on both machines, test and production. 

    With the other questions...
    I tested while the users weren't doing their work. Just to ensure that the only transactions in the system were the transactions I'm testing. 
    With the graphs I have to request support because I don't have access to a certain options, I'm not the IBM i Administrator, just a programmer :(. I hope to have the support next week. 

    As long as these programs do not execute the exact same operations, I do not see it is sensible to make such a conclusion.
    I didn't understand. The programs I mentioned basically does the same operations of delete records from the DES Key File (QAC6KEYST) time over time. Of course, between these deletes are other operations that use the cryptographic card. 

    Thanks again for your reply


    ------------------------------
    Ludwin Pérez
    ------------------------------



  • 6.  RE: Performance Issues with QCCA programs

    IBM Champion
    Posted Tue October 25, 2022 04:14 AM
    Edited by Satid Singkorapoom Tue October 25, 2022 04:36 AM
    Dear Ludwin

    Do you use PEX only in your test but not in your production environment?   Starting PEX incurs run-time performance overhead on selected running jobs and this can explain why you do not see much CU % Busy of the job in test environment.  Can you run your program in the test environment with PEX turned off and see if you notice higher run-time CPU % Busy?

    You earlier mentioned using SQL in your program.

    - Do most or all SQL statements that you use contain WHERE, ORDER BY, GROUP BY clauses and also use join operation?  The existence of indexes containing columns use by ALL these operations improve run-time performance (and reduce CPU consumption) of such SQL statements, especially for the case of WHERE clause that contains substantial calculation within (as opposed to simple comparison).  I'm aware of logical files you said you created but they reduced CPU % Busy by not too much but the question is whether you created a proper amount of indexes for those files or not? One best way to answer this is to use DB2i Plan Cache snapshot to check the access plans and index advisor of the jobs that you run these SQL.  If you do not know how to use Plan Cache snapshot, I have some articles that may give you an ides on how to use it.

    BTW, tables that are smaller than 100MB in size do not need indexes as the performance benefit of indexes is mostly not significant. So, focus on tables larger than 100MB.

    - Do data files in test environment (accessed by your SQL statements) contain the same or almost the same amount of data rows as those in production environment?    SQL without sufficient number of indexes scans tables a lot and files with small amount of data rows consume less CPU because less rows are scanned.

    - Do you enable DB2 SMP in your production environment or not?  DB2 SMP run multiple (as opposed to just a single task) SQL access tasks and they collectively consume more CPU power per job for the same statement accessing the same set of data.   To check this, you first check if SW product named DB2 Symmetric Mutiprocessing is installed or not. If so, check if the system value QQRYDEGREE is set to a value of *OPTIMIZED or *MAXIMUM or not. If yes for both checks, this explains high CPU % in your production environment (I'm assuming your test environment doe not enable DB2 SMP).  

    While SQL is running, you can run WRKACTJOB and press F11 until you see the column THREAD. If your job (I'm assuming it is a single-threaded job) displays more than a value of 1 for Thread column, that means SMP is active for your job.   You can also see SMP in action from Visual Explain of your SQL job.

    - Another possible, but rare, explanation (outside of SQL area) why your test environment does not consume as much CPU power (other than PEX overhead) can be that the LPAR is configure with too many more Virtual Processor than the allocate CPU cores as compared to your production LPAR (I'm assuming your production and test machines are of the same Power server model).  You can check this Virtual Processor assignment with WRKSYSACT command. 


    ------------------------------
    Right action is better than knowledge; but in order to do what is right, we must know what is right.
    -- Charlemagne

    Satid Singkorapoom
    ------------------------------



  • 7.  RE: Performance Issues with QCCA programs

    Posted Tue October 25, 2022 11:29 AM
    Edited by Ludwin Pérez Tue October 25, 2022 12:33 PM
    Hi Satid, 

    Do you use PEX only in your test but not in your production environment?
    I did it on both. And yes, I looked how much CPU time the job took to end with PEX and without it and I noticed that PEX incurs run-time performance overhead, but not too much. The PEX definition was limited in both cases to just one job, so the impact in general was not too much, and I decided to use  PEX because of and observation that one kind of transactions used significantly more CPU that other transactions, and one of the differences was the use of the CSNBKRD command, so to check if that was the reason I needed a tools that showed me the programs and the resources they occupy/consumed. 

    You earlier mentioned using SQL in your program.
    No, on the entire application we don't use SQL at all. What I mencioned was that while I was testing I noticed that the programs of the QCCA library used SQL to do some stuff, and specifically the program CSNBKRD or the programs it calls used SQL to delete the records from the DES Key File (QAC6KEYST). 

    BTW, tables that are smaller than 100MB in size do not need indexes as the performance benefit of indexes is mostly not significant. So, focus on tables larger than 100MB.
    Yeah, I mentioned before that I created an index recommended by the debugger (STRDBG) but it had low impact. 

    Do data files in test environment (accessed by your SQL statements) contain the same or almost the same amount of data rows as those in production environment? 
    Well not my SQL statements but at the end the SQL statements used by CSNBKRD or the programs it calls should (I have no idea what the SQL statements does because I can't see the code) access the DES Key file (QAC6KEYST). That file on both machines are not huge, they have like 5,000 records both. 

    Do you enable DB2 SMP in your production environment or not?
    Let me ask that to the admin, I hope to have that info soon....

    One thing to clarify...
    In both machines we see the same behavior, that the CSNBKRD had high CPU usage. It's not something that happens on test and doesn't happens on production, or vice versa. Both machines are practically the same in terms of hardware. 



    ------------------------------
    Ludwin Pérez
    ------------------------------



  • 8.  RE: Performance Issues with QCCA programs

    IBM Champion
    Posted Wed October 26, 2022 05:23 AM
    Edited by Satid Singkorapoom Wed October 26, 2022 09:00 AM
    Dear Ludwin

    While waiting for your system admin to respond, you should also use WRKACTJOB command to check if the job running your program runs as a multithreaded job or not.  If it does, then it can partly explain high CPU %. Your program may be single-threaded, but programs in QCCA that are called may be designed to run as mutithreaded (apart from whether DB2 SMP is active or not).  

    As for your original comment that you "created a program that directly deletes the record from the DES keystore file (QAC6KEYST). It had huge impact like (50% CPU reduction), but I'm not to sure if this is appropriate", I found the following statements from section 5.2.4 of i5/OS Data Encryption redbook (that I mentioned in my first responding post):

    [QUOTE]
    CCA keystore files are DB2 files. You can create a keystore file through the Web-based configuration utility, or your application program can create one using the Key_Storage_Initialization verb.

    You can create as many keystore files of either type as you wish. A CCA keystore file is used to store both data-encrypting keys and key-encrypting keys.

    Note: Two prototype files, QAC6KEYST (for the DES keystore) and QAC6PKEYST (for PKA keystore), are shipped in the QCCA library. Do not delete, initialize, or alter these files in any way.
    [UNQUOTE]

    It appears that you violate this suggestion but I have no idea if this is the cause of high CPU that you focus on or not because you mentioned the contrary.  But because of the statement above, I see that you should try using Key_Storage_Initialization verb to create and use your own keystore file to see if it turns out better or not.   I suggest you also study this redbook for more information that may be useful for you.

    As for another of your comments on your suspicion that the issue has something to do with " a software issue with the SQL internally used on the QCCA library programs (something like bad optimization, close of all cursors at the end of the program/module, for example)", I can say (from my long experience with DB2 for i) that you may have a misunderstanding.  SQL engine of DB2i has its own way of handling Open Data Path by design.  When you (or the called system SQL-based programs) run an SQL statement for the first time, ODP is erased at the end of its execution and therefore the closure of cursor. Only from the 2nd run of the same statement by the same job does DB2i starts to keep ODP (by design of the developer) - so all closures from this time on are "pseudo close" and it means ODP is maintained and subsequent run will be faster in varying degree. 

    If you have a real need to want ODP (and therefore cursor) to be maintained right from the first SQL execution of any statement, DB2 developer provides a simple way for this that you just create an empty data area named QSQPSCLS1 in a library that appears in the job's library list.  (I'm not sure if OVRDBF that you used is a proper way to go.) I have a detailed articles from 2005 and 2011 that explain this if you are interested to know more on this ODP and SQL in DB2i.    

    Please check if you can use 4767 in accelerator mode or not.  To use this mode, there is a condition that private keys created via Digital Certificate Manager are not created using 4767. If this is the case for you, you can just simply vary 4767 device description on and it will run in accelerator mode. This mode does not provide secure key storage, but it does process cryptographic operations at a much higher rate than in the other mode.  This information is from here :  https://www.ibm.com/docs/en/i/7.2?topic=coprocessors-features

    Lastly, as of POWER8 CPU, there is a new functional module in the CPU chip to help speed up data encryption but it supports AES. Since DES was declared obsolete about 15 years ago (this may justify your suspicion that your problem is in the SW part), I would say that, when all is said and done, you may not be lucky to address the issue you mentioned just because DES has no enhancement whatsoever ever since its obsolescence.  Switching to AES (or a more modern method) is logically a more sensible way if the current problem is a big deal for you. Driving a heavy bulldozer (DES or TDES) on a superhighway (POWER9) does not make it go significantly much faster. It is more sensible to change to a more proper vehicle. 


    Satid Singkorapoom
    ------------------------------