Hi experts, we have a 2-node ODM DS installation running on liberty. This is supporting a heavy workload with an average rate higer than 2.500/sec continously 24x7, with peaks over 3K/sec.
We're investigating why the response time of the HTDS increases a lot during short periods, so the clients start getting a timeout while calling the services. The execution time of the decision service in the statistics of RES console does not increase but the response time reported by liberty accessLog in the end-point increases a lot, generating the timeout in the client side. Note that accessLog is not only measuring the execution time but the whole round-trip, including request/response serialization and waiting time for resources availability.
The poolsize in each ODM node is set to 250 with 1500msg timeout. There're around 60 different rulesets, so I guess there is not a problem with HTDS resources. When the problem appears, the dump of the RES shows that XU pool is not exhausted. The CPU and Memory usage of the nodes is not affected too much.
¿Do you know some ODM or liberty configuration we can try to solve the problem? I'm thinking maybe about a limit of the the number of running threads or the number of available sockets for input connection or something like this.
Based on your experience, ¿Do you have any suggestion of the root cause of this performance problem?
------------------------------
Eduardo Izquierdo Lázaro
Automation Architect
DECIDE
Madrid
609893677
------------------------------