The symptom of “hanging System Tasks” is usually caused by some job (either in the User Tasks or in the System Tasks) that blocks. (I.e. runs into an endless loop, into a dead lock, etc.). Each time this job is started, it takes away one Thread from the System Tasks Thread-pool.
The default size of this pool is 5, I believe. So after such a blocking job has been started 5 times, the System Tasks pool is empty, and none of the System Tasks are executed anymore (including the “Session Sweeper”, so the expired Sessions are no longer removed).
Some possibilities for such a blocking job:
- If you schedule a Service, which uses the WmDB Services to access a database and there is a database problem. – Make sure to add enough error handling to the Service and in case of an Exception use pub.db:close with “$dbCloseConnection=true”. (This prevents that any subsequent executions of the Service get a stale DB connection.)
- Same for the pub.client.ftp Services. – There is a recent webMethods patch, which addresses problems with stale FTP connections.
- If you use Reverse Invoke, the “wm.server.security.revInvoke:keepAliveConnections” can cause a deadlock in the Scheduler as follows: assume that the Thread, in which an RI connection is started, notices that the connection is broken. It now tries to remove that connection from the RI connection pool. For this it first obtains the lock for the pool and then tries to obtain the lock for the connection. Now imagine that exactly at the same time the “keepAliveConnections” runs. It has acquired the lock for this connection and then pings it. The ping fails, so it tries to remove that connection from the pool and for this it needs the lock for the pool… So the result is: the Thread, in which the connection has been opened, has the lock for the pool and is now waiting for the lock for the connection, while the Scheduler Thread in which “keepAliveConnections” is running, has the lock for the connection and is waiting to obtain the lock for the pool! Then each minute a new “keepAliveConnections” is started, which runs right into this “traffic jam”, and very quickly the Scheduler is dead.
I think in big RI scenarios this problem is one of the most common reason for the “hanging System Tasks” problem. – I brought this to the attention of webMethods, so there should be a fix available soon.
#Integration-Server-and-ESB#webMethods#webmethods-Protocol-and-Transport