Topic Thread

Expand all | Collapse all

Block during checkpoint

  • 1.  Block during checkpoint

    Posted 6 days ago
    Hi 

    I need some input advise what could be wrong on my configuration.
    The users reported slow / hang performance during their processes.
    Is it normal during checkpoint there are blocked?

    By refer to online log  , i got some weird situation which i never encounter at other places.

    09:07:49 Maximum server connections 805
    09:07:49 Checkpoint Statistics - Avg. Txn Block Time 0.464, # Txns blocked 2, Plog used 3189, Llog used 65988

    09:10:56 Logical Log 4803 Complete, timestamp: 0x33ae5f33.
    09:10:57 Logical Log 4803 - Backup Started
    09:11:16 Logical Log 4803 - Backup Completed
    09:12:53 Checkpoint Completed: duration was 4 seconds.
    09:12:53 Thu May 21 - loguniq 4804, logpos 0xd31d0b4, timestamp: 0x33b1cbf7 Interval: 66869

    09:12:53 Maximum server connections 805
    09:12:53 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 3, Plog used 4935, Llog used 149330

    09:18:45 Checkpoint Completed: duration was 51 seconds.
    09:18:45 Thu May 21 - loguniq 4804, logpos 0x28723018, timestamp: 0x33ba9426 Interval: 66870

    09:18:45 Maximum server connections 805
    09:18:45 Checkpoint Statistics - Avg. Txn Block Time 7.717, # Txns blocked 17, Plog used 4777, Llog used 111889

    As u can see the # of trxns been blocked are high can be increase.

    By doing oncheck -g ckp, the bottom line shown :
    Based on the current workload, the physical log might be too small
    to accommodate the time it takes to flush the buffer pool during
    checkpoint processing. The server might block transactions during checkpoints.
    If the server blocks transactions, increase the physical log size to
    at least 23244800 KB.   ~23 GB

    DO i need to increase again?. I did increase previously base on recommended value given. I never had this big physical log at other client.

    Appreciate some advise from the member. The server was in RedHat 7.6 with Informix 12.10.FC6WE


    ------------------------------
    ABeMie
    ------------------------------


  • 2.  RE: Block during checkpoint

    Posted 6 days ago
    You are only using around 5000 pages of physical log space between checkpoints so you definitely do not need more log space. The problem is probably the number of dirty pages at checkpoint time. 

    What type of checkpoints are being reported by onstat -g ckp?





  • 3.  RE: Block during checkpoint

    Posted 6 days ago
    HI 
    Here I attach the onstat -g ckp  output plus onstat -d

    //

    ------------------------------
    ABeMie
    ------------------------------

    Attachment(s)

    txt
    onstat-d.txt   2K 1 version
    txt
    onstat-ckp.txt   3K 1 version


  • 4.  RE: Block during checkpoint

    Posted 6 days ago

    Hi,

    As Art said, the info in the online.log message may be an indicator of checkpoint slowness, but you NEED to view onstat -g ckp to have a final diagnostic on what is happening.
    A long checkpoint may be acceptable if you have low values (i.e say below 0.5 secs or less) in the Wait Time and Long Time columns columns. Those columns reflect during how long users transactions wait for the checkpoint.
    In your case,  you have 5.5, 1.5 12.6 8.8 wait times, which are "too long" and users waiting at those times.
    Looking at the Dirty Buffers column, correlating with the Ckpt Time of the same line, you can see that the time is not proportional to the number of of flushed buffers for instance at 09:18:44 it takes 51 sec to flush 61141 buffers with 12.6 secs of WAIT, and at 09:34:00 it takes 26.3 secs with 0.7 sec of waits  time, for about the same amount of buffers. This is not consistant.
    60000 buffers is 120 Mb, which is not a lot of data(if your system page size is 2K). It takes 28.5 secs to write, which is slow.

    To have an idea of disk performance, check the specs of the disk you have, and you will see that a disk is capable to write much more Megabytes in much less time. Reasons can again be multiple:
    - physical disk contention: your onstat -d says that ALL your chunks are located in the same file system, which is not good. Putting all the chunks in the same file system is the worst case situation. Having the physical log on 2 chunks may also not be a very good idea

    - if you run on a VM environment, with badly shared CPU and disk resources, this won't help

    - you may also have contention on Informix threads, due to the fact that all your chunks are sitting at the same place, or any oher reason. running onstat -z, then twice onstat -g spi with an interval of 10 minutes may point to what is waiting.

    - also run onstat -g iof and look at the chunks thruput

    - finally run iostat -d (after identifying which disk you are using) and see how the physical disk performs



    ------------------------------
    [eric] [Vercelletto] []
    [Founder]
    [kandooerp.org]
    [Pont l'Abbé] [France]
    [+33 626 52 50 68]

    Disclaimer: My own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
    ------------------------------



  • 5.  RE: Block during checkpoint

    Posted 6 days ago
    THnaks Mr Eric for the sharing. For your information the system running in VM. All the tips given still useful and meaning full to narrow down the issues. Correct?


    ------------------------------
    ABeMie
    ------------------------------



  • 6.  RE: Block during checkpoint

    Posted 6 days ago
    ABeMie:

    OK, I've also looked at the onstat -g ckp output. Look at the difference between checkpoint numbers 66859 and 66860. They are flushing nearly the same number of pages, but the first one flushes almost 9000 pages per second while the second checkpoint flushed less than 2600 pages per second. That is why, as Eric points out, that the second checkpoint took more than four time as long as the first one. The most likely reason for this contention for IO resources! Are you sharing SAN structures with some other applications? Machines? Databases sharing storage with Windows systems and especially mail servers is especially bad because the data access patterns are SO different. It's doesn't look like the disks or channels are slow because sometimes the IO is very fast.

    You should probably monitor the onstat -g ioh report looking for which chunks and at what times during the day the READ or WRITE service times are longer than say 15ms (0.015s) but only look at lines with significant IO counts (over 1000). The report shows IO performance for every chunk for the past 60 minutes.

    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.