Hi Marcus, thanks for the reply and suggestions. We have wondered about rollbacks too. On a previous occasion it needed a restart but the last one cleared after about 7 mins. We'll dig a bit more about potential rollbacks. We're running onstat -x, but good to have the additional guidance on the 'estimation'. Appreciate the reply and suggestions. Cheers, Mark
Original Message:
Sent: Tue November 21, 2023 02:39 AM
From: Marcus Haarmann
Subject: Informix processing "freezes"
Hi,
from my experience this can occur when a real huge rollback is occurring (or multiple at the same time).
This might result in a stuck engine, which reacts to onstat but is permanently blocking in very long checkpoints.
You can monitor this in onstat -x (which will give you an "estimation" of the remaining rollback time).
Look for lines with very big values in the locks column and which started a big number of logs in the past.
The ugly thing here is that a rollback of a very long transaction (we encountered one with
a lot of sblobs involved recently) can take even longer as it took to produce the data.
Our situation occurred because there were >5 parallel rollbacks in progress.
(User did not get a response and retried multiple times ;))
There were about 100 logs to rollback, which took ages.
We decided to kill the engine (onmode did not work any more) and restart (because any other activity
was mostly blocked anyway).
This resulted in a rollback of the transactions very quickly, because at startup time,
a number of parallel cleaners are running which speeds up the rollback.
In our situation, rollback time initially was displayed with 2h, and was resolved with the
engine bounce in 4min.
onmode -z the long transactions does not help, because they are typically already in rollback,
which needs to complete.
Best,
Original Message:
Sent: 11/20/2023 11:05:00 PM
From: Mark Clayton
Subject: Informix processing "freezes"
Hi all, wanted to ask some advice around some recent Informix "freezes" we've experienced lately. on two occasions we've had cases where Informix has frozen for several minutes resulting in the business application also freezing. the situation has resolved itself after a few mins, but has caused frustration, understandably, for our users. These issues have occurred during normal trading hours, during busier times.
On the latest occasion investigations indicate that a database checkpoint took a long time to complete (Total Time = 745) with very low Avg/Sec for Physical and Logical Logs (70 and 62, respectively). We also see very little other DB activity during this period consistent with the DB freeze (log rolls take almost twice as long as other log rolls around that time), and CPU activity drops during the 'freeze' (perhaps indicating no other non DB activities causing CPU maxing etc.).
Assumption is that informix has halted other DB activity to complete the checkpoint, or perhaps perform a rollback?
We're running a bunch of onstat commands to get a baseline, e.g. -x, -k, -p, -g ses, -u, but hard to see anything that sticks out.
There has to be an underlying cause, but any suggestions as to where to dig deeper / review?
Many thanks. Mark
------------------------------
Mark Clayton
------------------------------