OK, so v12.10.XX does not have the PFSC, so you should disable MAX_FILL_DATA_PAGE and see if that relieves the issue.
Art S. Kagel, President and Principal Consultant
ASK Database Management Corp.
Original Message:
Sent: Wed May 28, 2025 04:03 PM
From: Jacob Salomon
Subject: What triggers a blocking checkpoint?
Art and others who asked about our current release: IDS 12.10.FC9
Original Message:
Sent: 5/28/2025 1:37:00 PM
From: Art Kagel
Subject: RE: What triggers a blocking checkpoint?
Jacob:
On the new data, it looks like the long checkpoints are being caused by high transaction rates (hence the messages about the physical log and the "normal" page flush counts) that are causing the checkpoint to have to wait for many existing critical sections of client code to complete releasing that latch. The MAX_FILL_DATA_PAGES can be one cause of long critical sections, but there can be others, less easily worked around. As Andreas suggested, address this problem first as I suggested in my last post depending your version and config. If that doesn't solve it, then you can take the next steps.
Art
------------------------------
Art S. Kagel, President and Principal Consultant
ASK Database Management Corp.
www.askdbmgt.com
Original Message:
Sent: Wed May 28, 2025 01:25 PM
From: Marcus Haarmann
Subject: What triggers a blocking checkpoint?
Hi Jacob,
when looking at the numbers, you can see that the amount of pages dumped during the long
checkpoint was not significantly higher than the other ones.
I would have expected a huge increase in the number of dirty pages handled during the checkpoint.
Are you sure there is no I/O bottleneck? Caused by another process ?
In case you want to flush pages between the checkpoint interval, you need to configure the flushers,
in order to start flushing at x% of the configured LRU queues (which should be below the number of pages,
which are usually flushed).
Check with onstat -R for currently configured values.
In that case, server will start flushing between the checkpoints until the low percentage is reached.
Best,
Original Message:
Sent: 5/28/2025 12:41:00 PM
From: Jacob Salomon
Subject: RE: What triggers a blocking checkpoint?
Hats off (but not the kipa :-) to Art, and both Andreas's.
The overwhelming proportion of checkpoints last under 1/2 second, hough some go over 5 seconds. Nobody complains about those. But this morning had a 2-1/2 minute blockage. OUCH!
First Art: I indeed set up that 15-minute cron job. This morning I checked the online log and found, at 6:46 AM, the following:
05/28/25 06:43:25 Logical Log 2967450 - Backup Started
05/28/25 06:43:30 Logical Log 2967450 - Backup Completed
05/28/25 06:43:43 Physical Logging while in Critical Section: Number of pages logged in critcal section: 40 Remaining Phyical Log: 862792
05/28/25 06:43:46 Physical Logging while in Critical Section: Number of pages logged in critcal section: 80 Remaining Phyical Log: 862752
.... (total of 49 such lines
)
05/28/25 06:46:05 Physical Logging while in Critical Section: Number of pages logged in critcal section: 1920 Remaining Phyical Log: 860893
05/28/25 06:46:08 Physical Logging while in Critical Section: Number of pages logged in critcal section: 1960 Remaining Phyical Log: 860852
05/28/25 06:46:18 Checkpoint Completed: duration was 157 seconds.
05/28/25 06:46:18 Wed May 28 - loguniq 2967451, logpos 0x3a163a8, timestamp: 0x304bd7d6 Interval: 2342399
05/28/25 06:46:18 Maximum server connections 415
05/28/25 06:46:18 Checkpoint Statistics - Avg. Txn Block Time 149.940, # Txns blocked 81, Plog used 186726, Llog used 124998
And yes, there was a 2-1/2 minute freeze-up starting at 6:43. So now, let's look at the 15-minute interval for the onstat -g ckp. (We don't need to see all 20 entries)
AUTO_CKPTS=On RTO_SERVER_RESTART=Off
Critical Sections Physical Log Logical Log
Clock Total Flush Block # Ckpt Wait Long # Dirty Dskflu Total Avg Total Avg
Interval Time Trigger LSN Time Time Time Waits Time Time Time Buffers /Sec Pages /Sec Pages /Sec
2342398 06:39:39 CKPTINTVL 2967448:0xc686018 4.6 4.6 0.0 3 0.0 0.0 0.0 150708 33112 244770 995 178586 725
2342399 06:46:17 CKPTINTVL 2967451:0x3a163a8 157.9 7.6 0.0 81 149.9 82.4 150.0 131521 17328 186726 473 124998 317
2342400 06:50:19 CKPTINTVL 2967454:0x2ce3280 2.5 2.5 0.0 2 0.0 0.0 0.0 141278 56862 306279 1234 150773 607
This is really disappointing! I had expected that middle one above to be caused by an LTXEHWM or a physical log over 75% full. Nope: It was the ordinary checkpoint triggered by the 4-minute setting.
So, as Thomas Edison was famous for saying: We have not failed; we have a method that does not work. And actually, it did tell us something: A direction where NOT to bother looking.
Now for Andreas L's thoughtful thesis. Bottom line:
$ onstat -c MAX_FILL_DATA_PAGES
1
Exactly as Andreas supposed. There is an oncheck command (-T?) to report on the fullness of pages within a TBLspace but in a database this size that's kinda impractical.
But I will recommend changing that onconfig parameter. Certainly worth a try!
Thanks much to all. I'll let y'all know if this solved it.
------------------------------
Jacob Salomon
Original Message:
Sent: Tue May 27, 2025 04:39 PM
From: Art Kagel
Subject: What triggers a blocking checkpoint?
Jacob:
Need more details like the reported trigger in the onstat -g ckp report or the supporting sysmaster table. Since only a short list of checkpoints are kept online, the best thing to do would be to set up a cron job to run every 15 mins from 6:00 until 10:00 daily and post or send me the output file(s).
Art
------------------------------
Art S. Kagel, President and Principal Consultant
ASK Database Management Corp.
www.askdbmgt.com
Original Message:
Sent: Tue May 27, 2025 02:29 PM
From: Jacob Salomon
Subject: What triggers a blocking checkpoint?
Hi Y'all family.
In the system under my purvey, checkpoints occur very often, almost always duration is on the order of 1/10 of a second. However (Bet'cha I telegraphed that coming :-), a couple of times a day, there is a blocking checkpoint that temporarily blocks all applications for the duration. Most of the time it happens in early morning but often enough it occurs at other times. I did a grep on the current online.log:
$ egrep -e 'Block Time [1-9]' /usr/informix/online.log|awk '$10 >= 20.0 {print}'
05/20/25 06:31:23 Checkpoint Statistics - Avg. Txn Block Time 9.120, # Txns blocked 16, Plog used 64451, Llog used 160854
05/20/25 09:03:15 Checkpoint Statistics - Avg. Txn Block Time 9.575, # Txns blocked 190, Plog used 193460, Llog used 111805
05/21/25 09:02:59 Checkpoint Statistics - Avg. Txn Block Time 7.341, # Txns blocked 94, Plog used 157671, Llog used 112949
05/22/25 06:48:09 Checkpoint Statistics - Avg. Txn Block Time 32.120, # Txns blocked 114, Plog used 263959, Llog used 126449
05/23/25 06:31:34 Checkpoint Statistics - Avg. Txn Block Time 8.778, # Txns blocked 21, Plog used 129296, Llog used 193806
05/23/25 09:02:47 Checkpoint Statistics - Avg. Txn Block Time 6.448, # Txns blocked 87, Plog used 123006, Llog used 87426
05/25/25 06:00:10 Checkpoint Statistics - Avg. Txn Block Time 3.855, # Txns blocked 47, Plog used 39997, Llog used 55223
05/25/25 06:36:17 Checkpoint Statistics - Avg. Txn Block Time 30.251, # Txns blocked 55, Plog used 79072, Llog used 149772
05/26/25 06:36:12 Checkpoint Statistics - Avg. Txn Block Time 60.311, # Txns blocked 85, Plog used 238803, Llog used 213004
05/27/25 00:27:57 Checkpoint Statistics - Avg. Txn Block Time 22.939, # Txns blocked 38, Plog used 16940, Llog used 27395
05/27/25 06:44:15 Checkpoint Statistics - Avg. Txn Block Time 48.694, # Txns blocked 68, Plog used 44170, Llog used 140095
With the awk command I tried to see only those with duration over 20 seconds. Why awk is ignoring my filter is out of the scope of this question.) I have see durations as long as 150 seconds. Interestingly enough, each of these that I chased down is preceded by a bunch of lines like this, from this morning:
05/27/25 06:43:26 Physical Logging while in Critical Section: Number of pages logged in critcal section: 40 Remaining Phyical Log: 1003977
05/27/25 06:43:29 Physical Logging while in Critical Section: Number of pages logged in critcal section: 80 Remaining Phyical Log: 1003936
05/27/25 06:43:33 Physical Logging while in Critical Section: Number of pages logged in critcal section: 120 Remaining Phyical Log: 1003896
05/27/25 06:43:36 Physical Logging while in Critical Section: Number of pages logged in critcal section: 160 Remaining Phyical Log: 1003855
05/27/25 06:43:39 Physical Logging while in Critical Section: Number of pages logged in critcal section: 200 Remaining Phyical Log: 1003815
05/27/25 06:43:42 Physical Logging while in Critical Section: Number of pages logged in critcal section: 240 Remaining Phyical Log: 1003774
05/27/25 06:43:45 Physical Logging while in Critical Section: Number of pages logged in critcal section: 280 Remaining Phyical Log: 1003734
05/27/25 06:43:48 Physical Logging while in Critical Section: Number of pages logged in critcal section: 320 Remaining Phyical Log: 1003694
05/27/25 06:43:51 Physical Logging while in Critical Section: Number of pages logged in critcal section: 360 Remaining Phyical Log: 1003653
05/27/25 06:43:54 Physical Logging while in Critical Section: Number of pages logged in critcal section: 400 Remaining Phyical Log: 1003613
05/27/25 06:43:57 Physical Logging while in Critical Section: Number of pages logged in critcal section: 440 Remaining Phyical Log: 1003572
05/27/25 06:44:00 Physical Logging while in Critical Section: Number of pages logged in critcal section: 480 Remaining Phyical Log: 1003532
05/27/25 06:44:04 Physical Logging while in Critical Section: Number of pages logged in critcal section: 520 Remaining Phyical Log: 1003491
05/27/25 06:44:07 Physical Logging while in Critical Section: Number of pages logged in critcal section: 560 Remaining Phyical Log: 1003451
05/27/25 06:44:10 Physical Logging while in Critical Section: Number of pages logged in critcal section: 600 Remaining Phyical Log: 1003411
05/27/25 06:44:15 Checkpoint Completed: duration was 51 seconds.
05/27/25 06:44:15 Tue May 27 - loguniq 2967023, logpos 0x41b2444, timestamp: 0xea43c2a9 Interval: 2341876
05/27/25 06:44:15 Maximum server connections 407
05/27/25 06:44:15 Checkpoint Statistics - Avg. Txn Block Time 48.694, # Txns blocked 68, Plog used 44170, Llog used 140095
Referring back to this post about Physical Logging message, I understand that John Lengyel has rephrased the message. Still, it can't be coincidence that these messages appear during a blocking checkpoint. And I think in this case it is not useless as JL suggests. Scary? Yeah, kinda.
So back to the bottom line question in my subject line: What causes a blocking checkpoint? Its a pretty good bet that nobody is doing TRUNCATE operation and I don't see any sign of a long-transaction rollback in the log. If we know what could be casueing it we may be able to re-engineer the application code to stop doing that.
Thanks for ideas here.
------------------------------
-- Jacob Salomon
------------------------------