Hi,
We are using IBM BAMOE (formerly Redhat PAM) for more than 2 years. We have encounter a lot of issues but eventually through community posts or through online available docs we seem to resolve the issue.
Recently our production environment stopped responding, we had to restart but it keep going down because either CPU usage was very high and it crashes or eventually we get OOM. We noticed that the EJB queue was stuck and we were seeing a lot of timer errors/warning. We engaged IBM support but they were unable to help and the only way to resume jBPM was to cleanup both JBOSS_EJB_TIMER and TimerMappingInfo table. We have faced the issue twice and had to do cleanup twice.
After 10 days of constant debugging, environment upgrade to 8.0.6 and even DB downgrade (SQL 2022 to SQL 2019), we found that the issue could be because of this bug
https://issues.redhat.com/browse/JBPM-10242.
Fortunately atleast IBM gave us a patch, and after applying it in a controlled environment, we saw some positive gains. Our environment was at-least stable, no more high CPU <50% and no more OOM crashes.
But our problem is not fully resolved, we are now seeing some side effects:
1. EJB queue is constantly processing tasks, completing task and creating new ones.
2. JBOSS_EJB_TIMER table kept increasing with new timers, although TimerMappingInfo table is not increasing in proportion.
3. We are seeing a lot of deadlocks because of many concurrent INSERT and UPDATES in JBOSS_EJB_TIMER table.
4. We are not sure if we need to set both properties as true:
org.jbpm.ejb.timer.disable.linear.search
org.jbpm.ejb.timer.disable.linear.remove
5. How to find and remove stale EJB timers so the constant processing can be avoided.
Appreciate any help on this.
------------------------------
Aslam Ahmed
------------------------------