Informix

 View Only
Expand all | Collapse all

Error -25582, what's a reason of?

  • 1.  Error -25582, what's a reason of?

    Posted Tue August 29, 2023 10:41 AM

    Hi,

    IDS 11.70.FC5XE

    MSGPATH was being filled with error -25582,

    listener-thread: err = -25582: oserr = 0: errstr = : Network connection is broken.

    As cherry on the cake, IDS almost stopped to accept connections from app servers.

    The only solution we found is restarting IDS.

    What is a reason of the problem, does it have a less disruptive workaround?



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------


  • 2.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Sat September 02, 2023 05:17 PM

    Hi,

    As it says, network connection is broken so do you have seperate VPs for networking? 

    If do strace/truss them to maybe see what OS networking calls are failing.

    Otherwise strace/truss the misc vp and then unfortunately a cpu vp.

    Regards,

    David.



    ------------------------------
    David Williams
    ------------------------------



  • 3.  RE: Error -25582, what's a reason of?

    Posted Mon September 18, 2023 11:19 AM

    Hi,
    Below is a truss of a soc VP,
    OS: AIX 7.1

    # truss -cf -p 2688390
    signals ------------
    SIGALRM           14
    total:            14

    syscall               seconds   calls  errors
    kill                      .00      14
    times                     .00      14
    incinterval               .00      14
    ksetcontext_sigreturn     .00      14
    shutdown                  .00     151    144
    _erecvmsg                 .00     147
    _erecv                    .18   72867
    _esendmsg                 .00    1867
    close                     .00     144
    pollset_query             .00      19
    pollset_ctl               .00      96     15
    _pollset_poll                          437.91   60303      4
    __semop                   .08   38097
    semctl                    .00     377
    _getppid                  .00     147
    _getpid                   .00     147
                             ----     ---    ---
    sys totals:               .00  174418    163
    usr time:                 .00
    elapsed:                  .00



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 4.  RE: Error -25582, what's a reason of?

    Posted Mon September 18, 2023 11:44 AM

    Below is a truss when no errors encountered,

    # truss -cf -p 5046524
    Pstatus: process is not stopped
    signals ------------
    SIGALRM           15
    total:            15

    syscall               seconds   calls  errors
    kill                      .00      15
    times                     .00      15
    incinterval               .00      15
    ksetcontext_sigreturn     .00      15
    shutdown                  .00      27     17
    _erecvmsg                 .00      18
    _erecv                    .15   54075
    _esend                    .00      18
    close                     .00      17
    pollset_query             .00       5
    pollset_ctl               .00      62     18
    _pollset_poll             .94   54784     11
    kioctl                    .00      18
    __semop                   .07   24779
    semctl                    .00     245
    kfcntl                    .00      36
                             ----     ---    ---
    sys totals:               .00  134144     46
    usr time:                 .00
    elapsed:                  .00



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 5.  RE: Error -25582, what's a reason of?

    Posted Mon October 09, 2023 12:53 PM

    Hi Dennis, is your problem already solved? If yes, then please explain how. I have the same issue.

    ----------------------------------------------------
    Best Regards Halina

    is your problem already solved? if yes, then please explain how, I have the same problem


    ------------------------------
    Halina Kuharava
    ------------------------------



  • 6.  RE: Error -25582, what's a reason of?

    Posted Mon October 16, 2023 05:38 AM

    Hi Halina,

    What Informix version are you running?



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 7.  RE: Error -25582, what's a reason of?

    Posted Wed October 18, 2023 05:43 AM

    Hi Dennis, Informix 11.70.FC5XE , AIX 7.1



    ------------------------------
    Halina Kuharava
    ------------------------------



  • 8.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Wed October 18, 2023 07:41 AM

    Halina:

    Informix v11.70 is very old and has been out-of-support for a long time. You should work towards upgrading to v14.10, especially as v15 is due to be released 2024Q1.

    Anyway, it is possible that the connectivity issues that you are seeing have been fixed or at least mitigated by the past 10 years of updates and bug fixes!



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 9.  RE: Error -25582, what's a reason of?

    Posted Wed October 18, 2023 09:18 AM

    Art:

    BTW, we haven't managed to upgrade to 14.10--to be correct, we had to downgrade for some (a lot of) queries had got rather slow.



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 10.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Wed October 18, 2023 09:20 AM

    I can help with that.  B^)



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 11.  RE: Error -25582, what's a reason of?

    Posted Wed October 18, 2023 09:12 AM

    Exactly the same as us.
    Do you have an opportunity to update to FC9? IBM tells it addresses one of the bugs that cause -25582 (https://www.ibm.com/support/pages/fix-list-informix-server-1170xc9).
    Have you managed to figure out pre-requisites, or a workload pattern for the errors to occur?



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 12.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Mon October 09, 2023 02:09 PM

    Hi,

    It is not counts but the full truss needed and look for errors returned from system calls.

    As it says the network connection is broken.

    Either an OS/network issue or the client application exited without closing the connection.

    Regards,
    David.



    ------------------------------
    David Williams
    ------------------------------



  • 13.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Tue October 10, 2023 03:47 AM

    Most commonly, it's a DNS slowness/failure that's behind those -25582 errors (I recently also saw a PAM slowness having same effect.)

    -25582 typically means a given tcp connection is no longer functional (gone) since it was last used by one of the two parties.
    -25582 occurring in a listener thread means:

    • a client tried to start a new connection
    • the listener thread undertakes all that's required to service the connection attempt, incl. at least parts of authentication
    • ... this sometimes takes a bit ...
    • the client's timeout expires, so it gives up, terminating the connection from its end
    • the listener finally is done and tries to feedback to the client, finding out the connection is no longer usable -> -25582

    The problem originates from that "takes a bit" middle step which can be multiple things, and its duration can have various direct or indirect reasons.
    Most common one:  the listener need's the client's IP address reverse-resolved, to a host name, so it can match it with the name passed by the client, and this - gethostbyaddr() - sometimes is slow or not working at all.



    ------------------------------
    Andreas Legner
    ------------------------------



  • 14.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Tue October 10, 2023 09:08 AM

    Hi Andreas,

    Should "WARNING: Detected slow or failing DNS service response 1 time(s)."  type message appear in the online.log if DNS is slow?
    I think this has to happen a few times before that message is logged though.

    It would be better if there was a different message for connection failures and established connection failures

    Also If the client times out could it send a message back to the server to say this before closing the connection? The server would then get the message and the online.log could then contain a message to say a client timed out the connection.  It would be better to be able to see the different between 'too slow ' and 'a system call returned an actual error'!

    My other question is what other system calls apart from  gethostbyaddr() are used during establishing a connection?
    It would be good to know the whole flow for Linux at least!



    FYI I have seen multiple MSC VPs to allow those system calls done by the MSC VPs to be done in parallel help with connections as well.

    Regards,
    David.



    ------------------------------
    David Williams
    ------------------------------



  • 15.  RE: Error -25582, what's a reason of?

    Posted Tue October 31, 2023 10:25 AM

    Hi David,

    11.70 starts a single MSC VP by design, and doesn't allow to add more. Or is it an undocumented feature?



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 16.  RE: Error -25582, what's a reason of?

    Posted Mon October 16, 2023 04:31 AM

    Hello Andreas,

    I am trying to understand at what stage DNS is required. Is this the server accessing DNS? Could you please explain this more detailed?

    Best Regards / Mit freundlichen Gruessen Halina



    ------------------------------
    Halina Kuharava
    ------------------------------



  • 17.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Mon October 16, 2023 09:57 AM

    A (remote) client connecting to Informix sends its hostname to the server which the server then needs to verify against the client's IP address, derived from the tcp connection.  This verification is one of the listener's tasks, gets delegated to one of the MSC VPs where it's done through a gethostbyaddr() call (that the OS then, typically, translates into a DNS lookup).

    So it's mainly this type of "reverse lookup" calls to DNS that's being executed by the Informix server, as part of verifying the client really is who it claims it is.
    Not every new connection, though, goes through such DNS lookup, esp. if Informix NS_CACHE is configured and the host-IP info also gets cached in Informix shared memory, which is why DNS slowness sometimes cause periodic problems, with a more or less obvious time pattern - each time the NS_CACHE configured expiration elapsed a new DNS lookup is required.
    It sometimes simply is missing DNS configuration for "reverse lookup" that's causing this time of slowness or malfunction (I think a DNS entry needs "reverse lookup" checked).

    Should such gethostbyaddr() call take a while, it would of course block the (MSC) VP that's executing it, with a backlog effect on later new connections. If such blockage, or the cummulative blockage of multiple such lookups, take too long, then clients trying to connect might even give up, as detailed above.
    And, while multiple MSC VPs might help you with generally more expensive authentication methods, like some forms of PAM authentication, the effect of a severe DNS slowness will hardly benefit from them.

    HTH,

     Andreas



    ------------------------------
    Andreas Legner
    ------------------------------



  • 18.  RE: Error -25582, what's a reason of?

    Posted Mon October 16, 2023 10:12 AM

    Hello Andreas, could you please describe how is a new connection request processed on client side (4gl or java connection) as well? Could be the same problem related to DNS on server (it is the same server as db server and connections an are via local loopback)? Our customer gets error -27001 errors which have no corresponding error in online.log. Similar, requests to back up logical logs calling via alarm program also sometimes generating -27001 in bar_act.log - as onbar is a client program to db server as well. In this case - bar_debug.logs report 'cannot fork process' and we are not sure if problem is in forking or caused by the same reason like other clients program.

    Thanks a lot



    ------------------------------
    Milan
    ------------------------------



  • 19.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Mon October 16, 2023 10:56 AM

    Rather than use local loopback connections for local clients, try using either a shared memory connection (onipcshm) or a stream pipe connection (onstrpip). These are both faster than a loopback connection and do not involve and DNS lookups.

    Art



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 20.  RE: Error -25582, what's a reason of?

    Posted Mon October 16, 2023 11:02 AM

    Hello Art, thank you for a hint, for 4GL clients it could be a way to test. But I am not sure, if ipcshm or stream pipes are supported by JDBC driver?



    ------------------------------
    Milan
    ------------------------------



  • 21.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Mon October 16, 2023 11:14 AM

    As far as I know, the stream pipe is likely supported because it goes through the network libraries, it just bypasses the NIC card which loopback does not do. You'd have to try the shared memory type though to find out. I have no tried either with a Java client. 



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 22.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Mon October 16, 2023 11:17 AM

    Hi Milan,

    a client, if configured to connect to a server through host name (rather than IP address) surely will also first have to resolve this name to an IP address which, depending on OS configuration, might well be a DNS lookup. This step, of course, will precede the actual connection attempt while -27001, during connection attempt, would indicate the connection attempt is already being undertaken.

    Client side -27001 would be the typical error when the tcp level connection got established and a later communication step, e.g. during initial handshake/authorization, doesn't receive feedback from server in time.

    I'd have a look at how the server's doing when such error occurs.



    ------------------------------
    Andreas Legner
    ------------------------------



  • 23.  RE: Error -25582, what's a reason of?

    Posted Tue October 17, 2023 03:32 AM

    Hello Andreas, yes we suspect the problems usually happened where there are more connection requests concurrently. Currently problems occur on 14.10.FC9

    Do you have some hints, what we can collect on db server in that time? No errors in online.log

    onstat -g ath/cpu

    onstat -g stk

    onstat -g nta

    ...



    ------------------------------
    Milan
    ------------------------------



  • 24.  RE: Error -25582, what's a reason of?

    Posted Tue October 17, 2023 04:35 PM
    Edited by Dennis Melnikov Tue October 17, 2023 04:36 PM

    We discovered a workload pattern that highly likely results in an error storm.
    We have a master IDS server and an RSS replica server.
    Each has a Connection Manager running in REDIRECT mode.
    The Connection Managers are utilized by a bunch of application servers.
    On business hours, the replica serves as a report server, for load balancing.
    To perform the role assignment, we reconfigure the CMs in the morning.
    If such reconfiguration aborts, the master server gets additional sessions running reports.
    AND, in the end of business hours, at about 5:45 PM, error storm bursts out.
    Now, when it storms, we observe contention of a session mutex:
    ```
    onstat -g wmx

    Mutexes with waiters:
    mid      addr             name               holder   lkcnt  waiter   waittime
    17       70000004006c038  session            52       0      51       0
                                                                 25836845 0
                                                                 25836828 0
    ```
    Threads 51 and 52 are TCP listeners:
    ```
    onstat -g ath

    Threads:
     tid     tcb              rstcb            prty status                vp-class       name
     51       7000017a8f58d00  0                2    running                 1cpu*        soctcplst
     52       7000017a8f6dd28  0                2    mutex wait session      8cpu*        soctcplst
    ```
    We captured `truss` for 15 seconds of the TCP listener VPs while in regular operation (baseline) and in trouble.

    Baseline 1cpu

    ```
    signals ------------
    SIGALRM            5
    SIGTIO          1045
    total:          1050

    syscall               seconds   calls  errors
    kill                      .00       5
    sigprocmask               .00   21878
    times                     .00       5
    _nsleep                   .00     241    122
    incinterval               .00       5
    ksetcontext_sigreturn     .00    1050
    setsockopt                .00    1168
    shutdown                  .00     314    313
    _erecv                    .00     411    120
    _esendmsg                 .00     292
    _esend                    .02   22395
    accept1                   .00     542    250
    close                     .00     313
    kwritev                   .00      35
    _poll                     .00      50
    kioctl                    .00     292
    uname                     .00      48
    __semop                   .00    9517
    semctl                    .00      95
    kfcntl                    .00     584
    kaio_rdwr64               .00    1151
                             ----     ---    ---
    sys totals:               .00   60391    805
    usr time:                 .00
    elapsed:                  .00
    ```

    Trouble 1cpu

    ```
    signals ------------
    SIGALRM            5
    SIGTIO            14
    total:            19

    syscall               seconds   calls  errors
    kill                      .00       5
    sigprocmask               .00    3429
    times                     .00       5
    incinterval               .00       5
    ksetcontext_sigreturn     .00      19
    setsockopt                .00    3956
    shutdown                  .00     835    835
    _erecv                    .00     989
    _esendmsg                 .00     989
    _esend                    .00    4544
    accept1                   .00     989
    close                     .00     835
    _poll                     .00     918
    kioctl                    .00     989
    uname                     .00      45
    __semop                   .00    2092
    semctl                    .00      21
    kfcntl                    .00    1978
    kaio_rdwr64               .00      14
                             ----     ---    ---
    sys totals:               .00   22657    835
    usr time:                 .00
    elapsed:                  .00
    ```
    Baseline 8cpu
    ```
    signals ------------
    SIGALRM            6
    SIGTIO          2726
    total:          2732

    syscall               seconds   calls  errors
    kill                      .00       6
    sigprocmask               .02   38115
    times                     .00       6
    _nsleep                   .00      16     16
    incinterval               .00       6
    ksetcontext_sigreturn     .00    2732
    shutdown                  .00      36     36
    _erecvmsg                 .00      14
    _esend                    .00   16520
    close                     .00      36
    kwritev                   .00       1
    kioctl                    .00      14
    __semop                   .00     223
    semctl                    .00       3
    kfcntl                    .00      28
    kaio_rdwr64               .00    2734
                             ----     ---    ---
    sys totals:               .00   60490     52
    usr time:                 .00
    elapsed:                  .00
    ```
    Trouble 8cpu

    ```
    signals ------------
    SIGALRM            4
    SIGTIO          1813
    total:          1817

    syscall               seconds   calls  errors
    kill                      .00       4
    getuidx                   .00       5
    sigprocmask               .00   17371
    times                     .00       4
    _nsleep                   .00     127    127
    incinterval               .00       4
    ksetcontext_sigreturn     .00    1817
    shutdown                  .00      39     39
    _erecvmsg                 .00      13
    _esend                    .02   12040
    close                     .00      38
    kwritev                   .00      22
    kwrite                    .00      15
    kread                     .00       5
    lseek                     .00       5
    _poll                     .00    1637
    kioctl                    .00      13
    kopen                     .00       5
    umask                     .00      10
    uname                     .00       7
    __semop                   .00    8228      1
    semctl                    .00      82
    kfcntl                    .00      31
    kaio_rdwr64               .00    1961
                             ----     ---    ---
    sys totals:               .00   43483    167
    usr time:                 .00
    elapsed:                  .00
    ```
    Syscall traces are captured too, but they are too long to put them here.
    As you can see, the key difference between baseline and trouble is that errors remain on 1cpu VP at `shutdown` syscalls only. The errors are `ENOTCONN` as usual.



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 25.  RE: Error -25582, what's a reason of?

    Posted Thu October 19, 2023 09:18 AM

    Is there a way to reduce contention for session mutex?

    Like we can mitigate contention for dbs_partn mutex of a temp dbspace creating more temp dbspaces.



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 26.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Thu October 19, 2023 09:39 AM

    Frequent queries involving sysmaster:sysscblst SMI (or syssessions view depending on it) and view others can be a significant contributor to such session mutex contention, so limiting those might help.

    Also: there have been numerous optimizations around this particular mutex' contention, so being on a recent version might help as well ;-)



    ------------------------------
    Andreas Legner
    ------------------------------



  • 27.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Tue October 31, 2023 12:51 PM

    Hi,

    https://www.ibm.com/support/pages/apar/IT34183

    IT34183: POOR PERFORMANCE OF QUERIES ON SYSSCBLST PSEUDO-TABLE MAY CAUSE OTHER THREADS TO WAIT ON SESSION MUTEX

    Fixed in 12.10.xC15 or 14.10.xC5.

    https://www.ibm.com/support/pages/apar/IT35039

    IT35039: EXCEPTION CAUGHT DURING QUERY OF SYSSCBLST TABLE MAY LEAVE MUTEXLOCKED WITH POTENTIAL FOR DEADLOCK

    Fixed in 12.10.xC15 or 14.10.xC6.

    Regards,
    David.



    ------------------------------
    David Williams
    ------------------------------



  • 28.  RE: Error -25582, what's a reason of?

    Posted Thu November 16, 2023 08:11 AM
    Edited by Dennis Melnikov Fri November 17, 2023 02:10 AM

    Let me up the discussion.

    We are currently operating in this mode:
    At 17:30-18:00, storm of -25582 starts, after which we restart IDS.
    At 23:30-0:00 we have a service window during which user sessions are killed. IDS is up and running during this time.
    Normal operation is then resumed and IDS runs until the next storm begins.
    Storms of -25582 don't happen over the weekend.
    Yesterday we restarted the IDS at 5pm, before a storm burst out - and there was no storm!
    It seems very likely that the IDS is accumulating some kind of corruption - buffers / caches / queues etc., which is causing IDS to stop processing network connections correctly. The only way we currently know how to fix it is to restart IDS.



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 29.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Thu November 16, 2023 08:18 AM

    Dennis:

    This may be one of several very old bugs that present similar behavior. I see from your previous post that you are running v11.70.XC5.You need to upgrade at least to v12.10.<latest> or better go to v 14.10.<latest> (current release id 14.10.FC10W1) as once v15.xxxx is released (2024Q1) v12.10 will start its 12 month march to losing support (unless you pay through the nose for Extended Support).

    Art



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 30.  RE: Error -25582, what's a reason of?

    Posted Fri November 17, 2023 02:13 AM

    Art,

    Thank you.

    And that bug doesn't have a workaround but an upgrade?



    ------------------------------
    Sincerely,
    Dennis
    ------------------------------



  • 31.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Fri November 17, 2023 06:17 AM

    Dennis:

    OK, so -25882 could be an actual physical network error. However, it's unlikely so most likely it is one of those old bugs. I do not remember there being any workarounds.

    Art



    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 32.  RE: Error -25582, what's a reason of?

    Posted Mon November 20, 2023 06:43 AM

    Hi,

    I have the same problem, we have two servers PRI (host1)/hdr (host2) .. When the host1 is primary there are no network errors, but when exchanging in mode and the host2 is primary it is generated network errors 25882 and mutex

    700000183b8bb68  S--PX-- 217828   slvidala 84       70000009f7108b0  0    0     671      1
    700000183b90908  S--PX-- 217935   jacortin 105      70000009f70dca0  0    0     1920     1
    700000183b94e08  S--PX-- 218334   sanramir 119      70000009f7108b0  0    0     568      1
    7000001cdc35868  S--P--- 219844   dcas0412 148      70000009f70dca0  0    5     3104     0
    70000017056a028  S--P--- 220563   s1000    -        70000009f70d4c0  0    6     432      3
    70000017058d0c8  S--P--- 219119   arubiobe 1        70000009f70dca0  0    5     1363     0
    70000017059a868  S--P--- 220588   s1000    -        70000009f70d4c0  0    6     560      0
    70000017059b108  S--P--- 220266   yshernan 70       70000009f7108b0  0    5     108      2
    70000017059b9a8  S--P--- 220701   s1000    -        70000009f68bf40  0    2     8        0
    7000001705abc68  S--P--- 220662   s1000    -        70000009f70d220  0    6     24       1
    7000001705ac508  S--PX-- 220670   s1000    -        70000009f70d4c0  0    0     292      0
    70000017735e2a8  S--P--- 217115   mspabona 72       70000009f70dca0  0    5     1249     0
    700000177361668  S--P--- 220493   chrbravo 90       70000009f70dca0  0    5     62       0
    700000177368f28  S--P--- 218512   dapa9880 126      70000009f70db50  0    4     800      1
    70000017736ba48  S--PX-- 212260   s1000    -        70000009f68bf40  0    122   3698     1160

    00:06:28  listener-thread: err = -25582: oserr = 0: errstr = from G900603SVGN4 to server genesix_pri_3 : Network connection is broken.
    00:07:32  listener-thread: err = -25582: oserr = 0: errstr = from gn4-app1 to server genesix_pri_1 : Network connection is broken.
    00:07:32  listener-thread: err = -25582: oserr = 0: errstr = from gn4-app1 to server genesix_pri_1 : Network connection is broken.
    00:07:32  listener-thread: err = -25582: oserr = 0: errstr = from gn6-app1 to server genesix_pri_1 : Network connection is broken.
    00:18:06  listener-thread: err = -25582: oserr = 0: errstr = : Network connection is broken.
    03:00:33  listener-thread: err = -25582: oserr = 0: errstr = : Network connection is broken.
    03:00:57  listener-thread: err = -25582: oserr = 0: errstr = : Network connection is broken.
    03:01:23  listener-thread: err = -25582: oserr = 0: errstr = from G900603SVGN6 to server genesix_pri_3 : Network connection is broken.
    03:01:23  listener-thread: err = -25582: oserr = 0: errstr = from G900603SVGN4 to server genesix_pri_3 : Network connection is broken.
    03:01:23  listener-thread: err = -25582: oserr = 0: errstr = from G900603SVGN4 to server genesix_pri_3 : Network connection is broken.
    03:01:23  listener-thread: err = -25582: oserr = 0: errstr = : Network connection is broken.
    03:01:23  listener-thread: err = -25582: oserr = 0: errstr = : Network connection is broken.
    03:01:23  listener-thread: err = -25582: oserr = 0: errstr = from G900603SVGN4 to server genesix_pri_3 : Network connection is broken.

    How to determine if there is an error in the physical network?

    Regards,
    Arnulfo.



    ------------------------------
    Arnulfo Martinez Ruiz
    ------------------------------



  • 33.  RE: Error -25582, what's a reason of?

    IBM Champion
    Posted Mon November 20, 2023 12:18 PM

    listener-thread errors -25582, in my experience, almost never are due to physical network problems, esp. if "oserr = 0".

    What they typically mean (and the error reporting should be updated accordingly) is the listener thread wasn't able to do its work, at least not in time for the newly connecting client.

    Reasons can be among the following:

    • the listener thread not being served fast enough by the MSC VP, typically for DNS lookups, but have also seen e.g. AD queries blocking other MSC VP work and therefor also listeners; note that the reason for such slowness then is outside of Informix (check DNS).
    • some other Informix thread misbehaving and not yielding the cpu vp the listener needs to run (less frequent, but has happened)

    There'd be a warning for DNS slowness every once in a while, as David mentioned above, but only since v12.10.xC10.

    Check DNS reverse lookup on the affected host for G900603SVGN4, gn4-app1 and any other application/client machines.



    ------------------------------
    Andreas Legner
    ------------------------------