View Only
Expand all | Collapse all

Informix processing "freezes"

  • 1.  Informix processing "freezes"

    Posted 13 days ago

    Hi all, wanted to ask some advice around some recent Informix "freezes" we've experienced lately.  on two occasions we've had cases where Informix has frozen for several minutes resulting in the business application also freezing.  the situation has resolved itself after a few mins, but has caused frustration, understandably, for our users.  These issues have occurred during normal trading hours, during busier times.

    On the latest occasion investigations indicate that a database checkpoint took a long time to complete (Total Time = 745) with very low Avg/Sec for Physical and Logical Logs (70 and 62, respectively).  We also see very little other DB activity during this period consistent with the DB freeze (log rolls take almost twice as long as other log rolls around that time), and CPU activity drops during the 'freeze' (perhaps indicating no other non DB activities causing CPU maxing etc.).

    Assumption is that informix has halted other DB activity to complete the checkpoint, or perhaps perform a rollback?

    We're running a bunch of onstat commands to get a baseline, e.g. -x, -k, -p, -g ses, -u, but hard to see anything that sticks out.

    There has to be an underlying cause, but any suggestions as to where to dig deeper / review?  

    Many thanks. Mark

    Mark Clayton

  • 2.  RE: Informix processing "freezes"

    IBM Champion
    Posted 13 days ago
    Edited by Henri Cujass 13 days ago

    Hi Mark,

    did you check what's going on with the Checkpoints? 
    I suggest checking the online status line - maybe CHKP Blocked and to monitor the dirty page checkpoint write back with "onstat -R|grep dirty ".


    Henri Cujass
    leolo IT, CTO
    IBM Champion 2021 2022 2023
    IIUG Board Member

  • 3.  RE: Informix processing "freezes"

    Posted 12 days ago

    Hi Henri, thanks for the prompt reply.  Will check out that 'dirty page' suggestion.  Appreciate the advice.  Cheers. Mark

    Mark Clayton

  • 4.  RE: Informix processing "freezes"

    Posted 13 days ago

    from my experience this can occur when a real huge rollback is occurring (or multiple at the same time).
    This might result in a stuck engine, which reacts to onstat but is permanently blocking in very long checkpoints.

    You can monitor this in onstat -x (which will give you an "estimation" of the remaining rollback time).
    Look for lines with very big values in the locks column and which started a big number of logs in the past.

    The ugly thing here is that a rollback of a very long transaction (we encountered one with 
    a lot of sblobs involved recently) can take even longer as it took to produce the data.

    Our situation occurred because there were >5 parallel rollbacks in progress.
    (User did not get a response and retried multiple times ;))
    There were about 100 logs to rollback, which took ages.

    We decided to kill the engine (onmode did not work any more) and restart (because any other activity
    was mostly blocked anyway).
    This resulted in a rollback of the transactions very quickly, because at startup time,
    a number of parallel cleaners are running which speeds up the rollback.
    In our situation, rollback time initially was displayed with 2h, and was resolved with the
    engine bounce in 4min.

    onmode -z the long transactions does not help, because they are typically already in rollback,
    which needs to complete.



  • 5.  RE: Informix processing "freezes"

    Posted 12 days ago

    Hi Marcus, thanks for the reply and suggestions. We have wondered about rollbacks too. On a previous occasion it needed a restart but the last one cleared after about 7 mins.  We'll dig a bit more about potential rollbacks.  We're running onstat -x, but good to have the additional guidance on the 'estimation'.  Appreciate the reply and suggestions.  Cheers, Mark

    Mark Clayton

  • 6.  RE: Informix processing "freezes"

    IBM Champion
    Posted 12 days ago

    Was this 745 seconds checkpoint a blocking checkpoint?

    You'd see this immediately from an asterisk next to Trigger column in 'onstat -g ckp', but only if you captured such output not more than 20 checkpoints after the incident.

    To look further into the past, there's sysadmin:mon_checkpoint which should have (at least) all the checkpoints since last restart.

    If it was 'Blocking', then what was the trigger/caller?  And then, of course, slow disk i/o combined with volume of dirty pages, logical and physical log buffers to flush would have been the primary reason for the duration and, since blocking, the freeze.

    If it was not blocking, then it still could've been some session in "critical section" for a very long time, blocking the checkpoint from even starting ... and everyone else from entering into new "critical sections", i.e. from doing any modifying/transactional work.  The culprit would then be that first session now buried in the past.

    Without further details, we can only speculate ...


    Andreas Legner