Informix

 View Only
Expand all | Collapse all

Long Running Purge Script

  • 1.  Long Running Purge Script

    Posted Mon April 27, 2020 02:12 AM
    Hi,
    A purge program is submitted nearly 48 hours ago and is still running, probably deleting huge data (not run for some years).

    IDS is 11.50 and client program is in 4GL.

    Some of the parameters are:
    LTAPEDEV=/dev/null

    LOGFILES  42
    LOGSIZE   3000 (KB)
    #BACKUPLOGS=Y (commented)
    LOGBUFF  2048

    Are there any options to improve the performance of this purge routine, like CPU priority, more log files or log file sizes or by any other means?

    Thank you for your helpful suggestions.

    #Informix


  • 2.  RE: Long Running Purge Script

    IBM Champion
    Posted Mon April 27, 2020 02:35 AM
    Yes. Break-up the purge into many smaller transactions. You can use my dbdelete utility from my utils2_ak package. You can download the latest utils2_ak from my web site at www.askdbmgt.com/my-utilities





  • 3.  RE: Long Running Purge Script

    Posted Mon April 27, 2020 08:19 PM
    Hi Art,
    Thank you and appreciate your suggestion.

    As the job has been running for nearly 3days, it is not clear, whether it can be killed in this intermittent state, as it might leave the database in an inconsistent state?

    The creators of this instance have configured this database as one single dbspace (rootdbs) and all chunks are added to this dbspace.
    No explicit definition of logical logs, physical logs and temporary dbspaces.
    I presume all go into the rootdbs as they not have been defined.

    An interesting question here is:

    As logical logs are not backed-up (/dev/null) does it imply, IDS deletes the records as it progressively processes the records in a selection clause?

    How does IDS keep track of 'delete' flag for a record, if logical logs are not backed up, and when certainly the total amount of data runs into GBs much greater than the number logical logs defined?

    Any insights on how to tweak the IDS 11.50 at this stage to speed up the process?






  • 4.  RE: Long Running Purge Script

    IBM Champion
    Posted Mon April 27, 2020 09:51 PM
    Saradhi:

    OMG, I would guess that nothing has actually happened for a long time on that transaction! If the database is logged (I hope that it is BTW) then if the logical logs are not being backed up (see the onstat -l output, most of the lines in the lower report should include a 'B' in the 3rd flag column position indicating that they were backed up - sample output below) then the transaction will be forced to roll back when too large a percentage of the logical logs fill - or if the server is configured that way, then the engine will fill up disk space adding new logical logs as needed until it runs out of storage.

    You can see if the transaction is progressing by running "onstat -u" several times and looking at the reads and writes columns for the session performing the purge and see if it is doing anything. You can also just run a SELECT COUNT(*) on the table with the filters being used to purge the rows under DIRTY READ isolation and see if the number of remaining rows to be purge is still dropping.

    If you have to interrupt the transaction (onmode -z <session id>), there should not be any corruption. Depending on the logging mode of the database and the transaction type either the transaction will just exit leaving the table partially purged or all of the work that has been done over 3+ days will be rolled back which will take another 3+days probably. 

    Here the log with the uniqid of 1126 is the current log and so has not been backed up yet, but log 1125 and 1105 have been and log 1105 will shortly be reused and become log # 1127 on my system:

    address          number   flags    uniqid   begin                size     used    %used
    454faf88         7        U-B----  1125     3:53                 5000     5000   100.00
    452cff68         8        U---C-L  1126     3:5053               5000     1272    25.44
    453adf38         9        U-B----  1105     3:10053              5000     5000   100.00


    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 5.  RE: Long Running Purge Script

    Posted Tue April 28, 2020 12:22 AM
    Hi Art,
    Thank you for your response.
    I could see 'onstat -u' as well as 'onstat -l' indicate progress of reads/ writes.

    Logical logs show 'B'.
    [But as they go to /dev/null], it is really doubtful how IDS behaves if a rollback has to occur?

    I have checked on the 'onstat-d' and there is no usage of additional chunks.








  • 6.  RE: Long Running Purge Script

    IBM Champion
    Posted Tue April 28, 2020 08:19 AM
    Saradhi:

    Ahh, OK, You said the logs were not being backed up. But sending them to /dev/null counts, so good. Of course you can't use those logs to recover and roll forward after an archive restore.

    So, why is the big purge so slow. I find that over time massive deletes slow down. Probably because the indexes used to filter become unbalanced and inefficient due to the deleted keys. Deleted keys are not actually deleted but are only flagged as deleted. The btree cleaner threads clean up the index in the background later once the index gets enough hits that see deleted nodes and is placed on the btree cleaner's queue for cleaning. If that's going on during the purge it slows the purge down further. Also the massive delete has to access the index for each row it needs to find which it does over time.

    The way that dbdelete works is that it fetches a list of 8192 ROWIDs to be deleted at a time then it closes that cursor and then deletes those rows using a small number of DELETE statements with IN() clauses listing ROWIDs. Repeat. So it only accesses the index in blocks and navigates the btree less often reducing the dirty node hits. Anyway, it tends not to suffer from the same slowdown.

    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 7.  RE: Long Running Purge Script

    IBM Champion
    Posted Tue April 28, 2020 08:24 AM
    Another factor slowing down large purge or update runs could be lock lists getting longer and longer;  a typical symptom would be high cpu consumption by such job.
    Smaller transactions (or non-logging tables) can help with this too.

    ------------------------------
    Andreas Legner
    ------------------------------



  • 8.  RE: Long Running Purge Script

    Posted Tue April 28, 2020 09:21 AM

    onstat -x   does show the userthread  begin log and curr log and the est roll back time,, just need to do some onstat sleuthing to correlate the userthread to the session id doing the purge.

     






  • 9.  RE: Long Running Purge Script

    IBM Champion
    Posted Tue April 28, 2020 09:42 AM
    You can map the userthread address to session id in the onstat -u output.

    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 10.  RE: Long Running Purge Script

    Posted Wed April 29, 2020 02:47 AM
    Thank you Art, Mark and Andreas for your insights.

    The job today got aborted (a big sigh) with error message 458.

    Now, I have advised my application team to break the entire task into multiple steps, and at each stage minimize the scope of each delete, do a commit as frequently as possible.

    So, IDS is behaving well. But I wonder on a few elements not clear:

    1. Let us say there 40 Logical Logs of size 30000 KB.
        Am I correct to say that the total LOG SIZE is 40x30 = 1.2GB?

    2. So, a transaction cannot exceed this high value of 1.2 GB? Or how does IDS keep track of these limits?

    3. When does DYNAMIC LOGS comes into picture?

    4. How does LTXHWM (70%) and LTXEHWM (80%) play a role in this situation?

    5. When LTAPEDEV=/dev/null, and all the 40 log files are full, if the transaction continues, then there does IDS recording the logical logs?

    Lastly, a wishful idea:
    6. Why does not the IDS before execution estimate the quantum of change a transaction attempts to make and tally that size with the LOG SIZE and say sorry to the user instead of entering into a vert long transaction scenario, and then abort at the end!?

    Once again, thank you all, have a nice day.






  • 11.  RE: Long Running Purge Script

    IBM Champion
    Posted Wed April 29, 2020 09:43 AM
    Saradhi:

    See my responses below:

    So, IDS is behaving well. But I wonder on a few elements not clear:

    1. Let us say there 40 Logical Logs of size 30000 KB.

        Am I correct to say that the total LOG SIZE is 40x30 = 1.2GB?
    Yes. Correct.

    2. So, a transaction cannot exceed this high value of 1.2 GB? Or how does IDS keep track of these limits?

    Almost. There always has to be logical log space to log running transactions. The logs are used round robin so the oldest log is overwritten by the next one. However, if there is an open transaction in that log it cannot be overwritten so everything would have to stop because even a transaction that was about to close could not write its COMMIT record to the logs. In order to prevent that from happening when the percentage of total log space that is spanned by open transactions reaches the LTXHWM percent the oldest open transaction is stopped and automatically rolled back. Of course that rollback writes some data to the logs and other transactions also continue to write to the logs. Also the 2nd oldest transaction may also be blocking the same log file as the one that is rolling back so that may not be enough to free up log space. So, if the percent of log space that is spanned by open transactions reaches the LTXEHWM percent then new transactions will be blocked and all open transactions are rolled back.

    So, for example, if LTXHWM is set to 70 then when 70% of the logs contain open transactions then the rollback of the oldest starts. If LTXEHWM is set to 80 then when 80% of the logs contain open transactions all transactions that have not committed are rolled back. So, depending on what the settings for these parameters are you will get far less than 1.2GB of logs used before a rollback. Remember that it isn't that single oldest transaction's log records that are counted but all log records written since the beginning of the log containing that oldest transaction's BEGIN WORK record that are counted. So, for example, if an interactive program that is poorly written does BEGIN WORK and the user goes home for the day without committing it or rolling it back that begins the countdown and even though it may have written only a few dozen bytes to the logs it will be the oldest open transaction and that transaction will span all of the active logs until LTXHWM is reached and it is rolled back releasing all of that log space for reuse.

    3. When does DYNAMIC LOGS comes into picture?

    If you are set to DYNAMIC LOGS then the engine will automatically add new logs when you get to the LTXHWM instead of rolling transactions back. Of course that will eventually consume all of the free space in the ROOTDBS (or other dbspace if you specify one) then we are back to the scenario above in #2.


    4. How does LTXHWM (70%) and LTXEHWM (80%) play a role in this situation?

    See #2.

    5. When LTAPEDEV=/dev/null, and all the 40 log files are full, if the transaction continues, then there does IDS recording the logical logs?

    There are two criteria that govern whether a log file can be reused/overwritten: 1) it has been backed up (or at least marked as backed up because you have LTAPEDEV set to /dev/null), and 2) there are no open transactions contained in that log (see #2 again). If neither applies then the server will either add dynamic logs if so configured or it will just stop processing any "transactions". Some SELECTs may continue but nothing that needs log space.

    Lastly, a wishful idea:
    6. Why does not the IDS before execution estimate the quantum of change a transaction attempts to make and tally that size with the LOG SIZE and say sorry to the user instead of entering into a vert long transaction scenario, and then abort at the end!?

    This sounds like a good idea. However, haven't you ever done a SET EXPLAIN on a query and noticed that the estimated rows had no relationship in reality to the actual rows processed? Should the engine refuse a DELETE of 10 rows because the data distributions cause it to estimate that there are 30,000 rows that would be deleted? On top of that, as I pointed out in #2 a tiny transaction that is LOOONNGG but not large can still span all of the logs so ...

    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 12.  RE: Long Running Purge Script

    Posted Wed April 29, 2020 03:23 PM
    Hi Art,
    Thank you for your insightful clarifications.
    Thoroughly appreciate your help.

    With the best regards and thanks,
    Saradhi Motamarri
    +61430022130
    Sent from Yahoo Mail for iPhone





  • 13.  RE: Long Running Purge Script

    IBM Champion
    Posted Thu April 30, 2020 04:02 AM
    Hi Art,

    one clarification, as this seems to be a slight, yet not uncommon misconception: DYNAMIC_LOGS in fact is meant to allow transaction rollbacks to complete, rather than prevent 'long transaction' rollbacks in the first place, i.e. this only will kick in when you're already approaching the end of your log space (rather than approaching LTXHWM).  To have this working properly, you also should have your individual logical logs big enough, so even a super-fast transaction (rollback) cannot fill one of them in under two seconds (so the server has time to detect the dire situation and react).

    Regarding "end of your log space", this in fact normally would be the log file preceding the one holding the BEGIN of your oldest still open transaction.  With Enterprise Replication being part of the picture, another hard stop, equally prone to blocking the instance or, optionally, to dynamic log addition (s. CDR_MAX_DYNAMIC_LOGS, CDR_LOG_LAG_ACTION), would be the log file containing ER's 'replay position' - just for completeness sake ;-)

    ------------------------------
    Andreas Legner
    ------------------------------



  • 14.  RE: Long Running Purge Script

    IBM Champion
    Posted Thu April 30, 2020 07:16 AM
    Thanks for the clarification Andreas

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 15.  RE: Long Running Purge Script

    IBM Champion
    Posted Thu April 30, 2020 10:15 AM

    Nice little gotcha, running single user single session with LTXEHWM > 50 without DYNAMIC_LOGS then the engine will/can break when you hit a LTX. Corner case for sure but ....

     

    Cheers

    Paul

     






  • 16.  RE: Long Running Purge Script

    Posted Thu April 30, 2020 07:22 AM
    Thank you Andreas.