Hi David,
Thanks a lot for your response. We’ll check out the SP.
I think we can actually exclude the double storage file as a possible cause for our problems: we are in fact in the lucky situation of having a broker installed on two different servers and with one of the two everything works fine! We first thought that there was a difference in the number of QSs on the two servers, but we finally found another one in a different filesystem.
So, what is different between the two servers? In principle the configuration is identical: same versions, same ADLs, same parameters. Concerning the QS there are two differences: (1) on the slow server the .log file is 10 times the size of the same log on the fast server (approx. 1 GB versus 100 MB); (2) the slow server as two QS of 1 GB and 500 MB whereas the fast one has two og 1 GB.
Of course, there is no way to infer that the slow server logs more just because it has a bigger log file
But, one possible explanation for all the time() system calls when doing the truss on awbroker – it was doing around 80k system calls/second where the fast server doing the same thing did around 3k sc/s – would be that the broker is tracing it’s internal activity and writing time stamps for each log line. Then again, there may be other explanations for calling time() related to the internal working of the broker. Can’t tell.
What is clear is that the the “source” adapter (an IO adapter reading an XML file) and the QS/broker is OK. The QS fills up at the same rate on the two servers. The difference is between the QS/broker and the “target” adapter: in fact the network is different in the two cases. For the fast broker server the “target” is in the same rack separated by a simple switch. In the slow case the broker and “target” adapter are interconnected via a VLAN between two locations separated by approx. 1 km. The again the connection between the two locations is fiber at 1 GB. Any differences in throughput and latency should be minimal.
When running this batch on the two servers the observed difference on the system level is that the awbroker consumes a lot more CPU (in user mode): 45 % CPU versus 3 %. That seems normal given the 80k system calls/second that the slow server consumes.
I wish I could see a logical explanation for calling time() something like 70k times/second. In particular, how can it be related to the difference in network topology…?
Just to be clear, in the two cases the same “target” adapter on the same server is used.
Hopefully we’ll be able to do some controlled experiments in the next few days.
Best regards,
Frank Olsen
#webMethods#Integration-Server-and-ESB#broker#Universal-Messaging-Broker