Power

 View Only

 Inside IBM i, Websphere Application Server 8.5 goes to 100% and get IBM i Stuck

Urtzi Larrieta Alvarez's profile image
Urtzi Larrieta Alvarez IBM Champion posted Fri March 07, 2025 07:32 AM

Hello.

First of All, sorry if the question was commented in other post. We have an IBM LPAR with 7.4 OS and WAS 8.5 ND deployed, with some ear applications and HATS. For reasons unknown, WAS 8.5 goes to 100% of CPU consumption and the only way to revert the situation is terminating the WAS process and starting again.

This affect other subsystems like night backup that don't end in the designated window, because WAS is taking all the resources.

We might think that someone logged in HATS has done an infinite loop SQL or something like that, but the loglogs or the system.out of the running WAS process does not log any problem, simply someone do something that put WAS in this situation. The Joblog does not write any error, only SQL estatement execution, as those:

SQL7908    Terminación             00   06/03/25  18:00:15,286721  QSQROUTS     QSYS        *STMT    QSQCLI      QSYS        *STMT
                                     Módulo origen . . . . . . . :   QSQSRVRC
                                     Procedimiento origen  . . . :   SQSERVER
                                     Sentencia . . . . . . . . . :   9236
                                     Módulo destino  . . . . . . :   SQLCON
                                     Procedimiento destino . . . :   SQLConnect
                                     Sentencia . . . . . . . . . :   15299
                                     Hebra . . . . . :   000000EB
                                     Mensaje . . . . :   El trabajo 903373/QUSER/QSQSRVR se ha utilizado para el
                                       proceso en modalidad de servidor SQL.
                                     Causa . . . . . :   Se ejecutó una sentencia SQL (Lenguaje de consulta
                                       estructurada) mientras se ejecutaba en modalidad de servidor SQL. Las
                                       sentencias SQL para esta conexión o hebra se procesarán en el trabajo
                                       903373/QUSER/QSQSRVR. Descripción técnica . . . . . . . . . : Se solicitó la
                                       modalidad de servidor SQL estableciendo el atributo de trabajo de modalidad
                                       de servidor SQL, o estableciendo el atributo de entorno de modalidad de
                                       servidor a través de la interfaz de nivel de llamada SQL. Cuando se ejecuta
                                       en esta modalidad, las sentencias SQL se procesan mediante un trabajo
                                       independiente, que se ejecuta bajo el perfil de usuario especificado para la
                                       conexión. El identificador de hebra es 267 y la conexión es con la base de
                                       datos relacional C703CBE0. Si el nombre de la base de datos relacional es
                                       *N, esto significa que todas las conexiones para la hebra utilizarán el
                                       mismo trabajo.

We are completely lost and we don't know how to address this issue to prevent it happening again. Also as inside WAS are 20 applications deployed, we also can't address what is the application that is putting WAS in that consumption. Once WAS goes to 100%, we don't know what to research or know how to cancel the process to the user who had put the WAS in that situation.

If you had issues like that regarding WAS, please tell me how you get to the solution, or how to monitor WAS in order to know who or what causes CPU going weird.

Best Regards,

Urtzi Larrieta

Marius le Roux's profile image
Marius le Roux IBM Champion

Hi Urtzi Larrieta, 

I've had some experience with performance remediation on IBM i, and I recommend starting with the Performance Data Investigator (PDI) graphs to troubleshoot which types of jobs are causing the issue.

If you're seeing SQL-related spikes, pay special attention to QZDAOINIT and QZRSRVS jobs, as they can often be the culprits.

A helpful primer on using PDI can be found in these resources:

📌 Intro to the Performance Data Investigator (Dawn May, 2018):
🔗 Download PDF

📌 Intro to the Performance Data Investigator (Fall COMMON 2014):
🔗 Download PDF

For additional insights, Satid Singkorapoom has some excellent articles explaining these concepts in depth:
🔗 Satid’s Articles on IT Jungle

If you need further assistance diagnosing the issue, Dawn May is one of the best experts in this space, and you can reach out to her via the Seiden Group:
🔗 Seiden Group Website

Hope this helps!




Urtzi Larrieta Alvarez's profile image
Urtzi Larrieta Alvarez IBM Champion

Hello!

First of all, thank you Marius le Roux for your reply. Indeed, I used PDI graphs because we have Performance Tools and agents operating. 

The problem here is that all of the graphs show that the process WAS85SVR is taking the 100% of CPU. If we get into the Threads graph, is the JVM taking the overall CPU, but we don't have the information to "discriminate" what inside Websphere Application Server is causing CPU get's 100%.

We have 20 application servers (ear) services running inside WAS, but PDI graphs don't enter inside the WAS to know what of those app is generating this issue, and WAS administrator page does not show any performance.

PDI treats the issue pointing to WAS85SVR, obvious, but does not investigate in depth. 

So, the question is: Is there any utility or facility I can use to investigate in depth, what are the CPU consuming apps (ear) inside WebSphere Application Server V8.5 inside IBM i?

Thank you in advance.

Marius le Roux's profile image
Marius le Roux IBM Champion

Perhaps we can approach this from another angle.

How about analyzing the SQL Plan Cache for the user or identifying long-running SQL statements?

I've found these articles helpful in tracking performance issues:
🔗 Understanding Database Performance Using the Performance Data Investigator – Part 1
🔗 Understanding Database Performance Using the Performance Data Investigator – Part 2

You can trace problematic queries starting from the Job Name provided in the interface.

Also, do you have a license for iDoctor? If so, you can start a Job Watcher (JW) with JVM analysis and use the reports to determine why the JVM process is consuming significant CPU resources. (if you also never used it before on the system, you can get a trial for it) 

Would be interesting to see what insights you get from this.

Marius le Roux's profile image
Marius le Roux IBM Champion

(Disclaimer: I don’t have any WAS experience, but I do have strong research skills and can dig deep into IBM resources and the broader internet for similar issues.)

If you’re feeling adventurous, here are some things you can try:

🔹 I found a guide that includes PEX trace commands, which might help isolate Java lines for deeper investigation:
➡️ IBM i Cookbook – Operating Systems

(The "adventurous" part here is trying different snapshots and analyzing JVM dumps to pinpoint what could be the issue.)

🔹 Another potentially useful document (though I’m not certain if it only applies to the Express version):
➡️ WAS 8.5 Express – Application Troubleshooting

Inside this document, this seems to be the option you can enable JDBC tracing at the statement level, which could help uncover SQL-related issues:
TRACE_STATEMENT_CALLS = 2

*you might want to just research where the detail gets sent and what impact this might have on production, if there are many SQL statements this could cause performance issues. 


🔹 Lastly, you can try the MustGather procedure. This typically provides a wealth of diagnostic data, which you can analyze yourself or submit to IBM support (they’re usually quite efficient depending on your SLA).
➡️ IBM MustGather Guide

Hope this helps!

Urtzi Larrieta Alvarez's profile image
Urtzi Larrieta Alvarez IBM Champion

Thank you for your useful information, I'll take a look to those links, I hope I can address the problem with any hint that point to your documentation.

Best Regards!

Marius le Roux's profile image
Marius le Roux IBM Champion

Hi Urtzi , 

Did you manage to progress forward with your investigation yet? 

A colleague of mine suggested that you can try a DBMON on the user to perhaps track the long running SQL if that stands out , looking at the plan cache could also be of benefit. 

Database Monitor

That might help also as a focus point on if that SQL statement is perhaps only found in one place or not for the developers to investigate further together with some additional metadata around the culprit SQL Statement.(look out for SQL reason Codes)

Worth a try perhaps if you can consider.

Urtzi Larrieta Alvarez's profile image
Urtzi Larrieta Alvarez IBM Champion

Thank you very much, I didn't know that tool. I'll try it. On the other hand, we addressed the WebSphere Performance Monitor, but, as always, once we have the tools, the issue didn't appear again :)

So, we are expecting to produce again the issue to go directly to start the monitor tools and address the source of the problem.

Thank you very much for your support.

Best Regards,

Urtzi Larrieta

Marius le Roux's profile image
Marius le Roux IBM Champion

Glad that its not a recurring issue , though those "intermittent" ones are annoying and always pops up when least expected (could be load related from systems such for example on month end / begin times). 

But you are welcome, I do ask if possible post the results of your findings here once you resolved / found the issue. 

Regards.