Hello Andreas, yes we suspect the problems usually happened where there are more connection requests concurrently. Currently problems occur on 14.10.FC9
Do you have some hints, what we can collect on db server in that time? No errors in online.log
...
Original Message:
Sent: Mon October 16, 2023 11:16 AM
From: Andreas Legner
Subject: Error -25582, what's a reason of?
Hi Milan,
a client, if configured to connect to a server through host name (rather than IP address) surely will also first have to resolve this name to an IP address which, depending on OS configuration, might well be a DNS lookup. This step, of course, will precede the actual connection attempt while -27001, during connection attempt, would indicate the connection attempt is already being undertaken.
Client side -27001 would be the typical error when the tcp level connection got established and a later communication step, e.g. during initial handshake/authorization, doesn't receive feedback from server in time.
I'd have a look at how the server's doing when such error occurs.
------------------------------
Andreas Legner
Original Message:
Sent: Mon October 16, 2023 10:11 AM
From: Milan Rafaj
Subject: Error -25582, what's a reason of?
Hello Andreas, could you please describe how is a new connection request processed on client side (4gl or java connection) as well? Could be the same problem related to DNS on server (it is the same server as db server and connections an are via local loopback)? Our customer gets error -27001 errors which have no corresponding error in online.log. Similar, requests to back up logical logs calling via alarm program also sometimes generating -27001 in bar_act.log - as onbar is a client program to db server as well. In this case - bar_debug.logs report 'cannot fork process' and we are not sure if problem is in forking or caused by the same reason like other clients program.
Thanks a lot
------------------------------
Milan
Original Message:
Sent: Mon October 16, 2023 09:56 AM
From: Andreas Legner
Subject: Error -25582, what's a reason of?
A (remote) client connecting to Informix sends its hostname to the server which the server then needs to verify against the client's IP address, derived from the tcp connection. This verification is one of the listener's tasks, gets delegated to one of the MSC VPs where it's done through a gethostbyaddr() call (that the OS then, typically, translates into a DNS lookup).
So it's mainly this type of "reverse lookup" calls to DNS that's being executed by the Informix server, as part of verifying the client really is who it claims it is.
Not every new connection, though, goes through such DNS lookup, esp. if Informix NS_CACHE is configured and the host-IP info also gets cached in Informix shared memory, which is why DNS slowness sometimes cause periodic problems, with a more or less obvious time pattern - each time the NS_CACHE configured expiration elapsed a new DNS lookup is required.
It sometimes simply is missing DNS configuration for "reverse lookup" that's causing this time of slowness or malfunction (I think a DNS entry needs "reverse lookup" checked).
Should such gethostbyaddr() call take a while, it would of course block the (MSC) VP that's executing it, with a backlog effect on later new connections. If such blockage, or the cummulative blockage of multiple such lookups, take too long, then clients trying to connect might even give up, as detailed above.
And, while multiple MSC VPs might help you with generally more expensive authentication methods, like some forms of PAM authentication, the effect of a severe DNS slowness will hardly benefit from them.
HTH,
Andreas
------------------------------
Andreas Legner
Original Message:
Sent: Mon October 16, 2023 04:31 AM
From: Halina Kuharava
Subject: Error -25582, what's a reason of?
Hello Andreas,
I am trying to understand at what stage DNS is required. Is this the server accessing DNS? Could you please explain this more detailed?
Best Regards / Mit freundlichen Gruessen Halina
------------------------------
Halina Kuharava
Original Message:
Sent: Tue October 10, 2023 03:46 AM
From: Andreas Legner
Subject: Error -25582, what's a reason of?
Most commonly, it's a DNS slowness/failure that's behind those -25582 errors (I recently also saw a PAM slowness having same effect.)
-25582 typically means a given tcp connection is no longer functional (gone) since it was last used by one of the two parties.
-25582 occurring in a listener thread means:
- a client tried to start a new connection
- the listener thread undertakes all that's required to service the connection attempt, incl. at least parts of authentication
- ... this sometimes takes a bit ...
- the client's timeout expires, so it gives up, terminating the connection from its end
- the listener finally is done and tries to feedback to the client, finding out the connection is no longer usable -> -25582
The problem originates from that "takes a bit" middle step which can be multiple things, and its duration can have various direct or indirect reasons.
Most common one: the listener need's the client's IP address reverse-resolved, to a host name, so it can match it with the name passed by the client, and this - gethostbyaddr() - sometimes is slow or not working at all.
------------------------------
Andreas Legner
Original Message:
Sent: Tue August 29, 2023 10:40 AM
From: Dennis Melnikov
Subject: Error -25582, what's a reason of?
Hi,
IDS 11.70.FC5XE
MSGPATH was being filled with error -25582,
listener-thread: err = -25582: oserr = 0: errstr = : Network connection is broken.
As cherry on the cake, IDS almost stopped to accept connections from app servers.
The only solution we found is restarting IDS.
What is a reason of the problem, does it have a less disruptive workaround?
------------------------------
Sincerely,
Dennis
------------------------------