InfoSphere Optim

 View Only
Expand all | Collapse all

Optim Deletes

  • 1.  Optim Deletes

    Posted Fri August 30, 2019 09:16 AM
    We have tables in SQL Server that have columns defined as varchar(max).  When we have them included in an job to archive/delete, the process is agonizinlgy slow.  Currently we have 1 job that recently ran for 3 hours 21 minutes and it only removed 26730 rows.  This is a fairly simple job.  2 tables.  This timeline is fairly consistent.  Longer for more rows, less for less rows. 

    How do we speed this up? 

    We are getting ready to tackle a group of tables(3) where the parent has the varchar(max) column and we need to remove between 7-10 Million rows every night just from the parent.

    Thoughts? Recommendations? Suggestions?

    ------------------------------
    [Phil] [Neff]
    [Sr. Data Architect]
    [J.B. Hunt]
    [Lowell] [AR]
    ------------------------------

    #InfoSphereOptim
    #Optim


  • 2.  RE: Optim Deletes

    Posted Mon September 02, 2019 07:59 AM
    Phil,

    By default, Optim does a byte by byte comparison of the archive files and the table data being deleted.  You can turn off the comparison which should help immensely. 

    Please look at this Defining Delete Optims for more details.

    ------------------------------
    JIMMY WILSON
    IBM DTE Optim, Virtual Data Pipeline, Test Data Fabrication Specialist
    ------------------------------



  • 3.  RE: Optim Deletes

    Posted Tue September 03, 2019 03:24 PM
    Jimmy,

    We have already turned the comparison off.

    Thanks,

    Phil

    ------------------------------
    [Phil] [Neff]
    [Sr. Data Architect]
    [J.B. Hunt]
    [Lowell] [AR]
    ------------------------------



  • 4.  RE: Optim Deletes

    Posted Tue September 03, 2019 10:12 AM
    Phil,

    Along with Jimmy's suggestion you may play with following options ​to improve the performance:
    1. In an archive request under Tools--> Edit Delete Strategy, you may change 'Key Lookup Limit' to a number >10
    2. You may try increasing number of rows under 'Commit Frequency'
    3. You may try with multiple DB Connections
    4. You may try by unchecking compression of archive/index files under "Archive File Options"
    5. You may also look at the number of indexes defined on these tables and if there are some indexes that are unused, you'll have to work with your DBA for dropping unused indexes.

    Thanks,
    Sanjay



    ------------------------------
    Sanjay Marwah
    ------------------------------



  • 5.  RE: Optim Deletes

    Posted Tue September 03, 2019 03:27 PM
    Sanjay,

    ​We will look into your points.  We will turn the compression off for one of our jobs tonight and see how it performs.  It is expected to remove about the same number of rows that were removed in a previous nightly run.  I do know that point #5 is not doable​.

    Thanks,

    Phil

    ------------------------------
    [Phil] [Neff]
    [Sr. Data Architect]
    [J.B. Hunt]
    [Lowell] [AR]
    ------------------------------



  • 6.  RE: Optim Deletes

    Posted Tue September 03, 2019 03:28 PM
    Does Optim have the capability to just do the delete without creating the archive file?​

    ------------------------------
    [Phil] [Neff]
    [Sr. Data Architect]
    [J.B. Hunt]
    [Lowell] [AR]
    ------------------------------



  • 7.  RE: Optim Deletes

    Posted Tue September 03, 2019 03:43 PM
    Phil - There is no way to do the delete without first creating the archive file.  You can also use array deletes but that could be only on Oracle.  You would be able to delete a larger number of rows by having Optim send just the keys.  I haven't used it in a while so would have to review the requirements.

    ------------------------------
    JIMMY WILSON
    ------------------------------



  • 8.  RE: Optim Deletes

    Posted Wed September 04, 2019 10:22 AM
    Hi Phil, 

    It sounds like you're just using Optim to purge related data from a set of tables and don't really care about the creation of an archive file. We also use Optim in this capacity frequently and run into *very* slow Selects against tables with large BLOB/CLOB/XML fields. Sadly, Optim won't allow you to deselect the column in the AD when it's an archive (you can do that for an Extract, but you can't delete from an Extract file). I have asked IBM to allow the deselect of columns on archives (or to Delete from and Extract), but that has gone nowhere.

    My way around this when the LOB field is in the Start Table is to get creative with the SELECT SQL. Essentially, you use the power of a UNION to select all the columns except the LOB column. It works like a charm once you learn how to code it. It will look something like so: 

    SELECT...FROM Table1 WHERE
    1=0
    UNION ALL
    SELECT A.[Col1]
          ,A.[Col2]
          ,A.[Col3]
          ,A.[Col4]
          ,A.[Col6]
          ,A.[Col7]
          ,A.[Col8]
          ,null
      FROM [Your_DB_Name].[dbo].[Table1] as A
    WHERE A.Col2 <= dateadd(Hour,-2,getdate()) -- or whatever your criteria is


    The Lines in Green are what you normally see for the SQL Select in the Optim GUI, The lines in blue are what you need to add to the SQL to Select all the rows you actually care about, but without the LOB column. Please note that in my example the LOB column is Col5, but the null value I use to replace it with is listed last on the column list in the SELECT statement.  If you try to code this with the null listed in the spot where Col5 should go, Optim will throw an error at runtime.  It expects LOB data last, regardless of the column order. 

    Again, this only works for not selecting LOB data in a Start table. LOB data in a child will remain and be a painfully slow select. 

    Hope this helps your situation. It has definitely been a big performance boost on our jobs that only do purging and not archiving. 


    ------------------------------
    Keith Tidball
    Progressive Insurance
    ------------------------------



  • 9.  RE: Optim Deletes

    Posted Wed September 04, 2019 10:39 AM
    Keith,

    Thanks for the info.  We are currently running one of Jimmy's options right now.  We will try yours next.  

    We really appreciate all the help on this. 

    Who do I need to tell at IBM that we need the Deselect on archive also.

    Thanks,

    ------------------------------
    [Phil] [Neff]
    [Sr. Data Architect]
    [J.B. Hunt]
    [Lowell] [AR]
    ------------------------------



  • 10.  RE: Optim Deletes

    Posted Wed September 04, 2019 11:27 AM
    Hopefully one of these options will help you. My solution works for slow selects due to Optim dragging the LOB data across your connection...multiple keys tends to help more for slow deletes. However, Optim has an edit that prevents you from deleting multiple keys in a Parent table if there is RI tied to a child. This is an Optim restriction...SQL Server has zero restrictions on deleting multiple keys at once from a parent. I have also asked for that restriction to be lifted since it seems like some archaic rule which never got updated, but that request has also been pushed to the dustbin.

    IBM Product Management is the best place to bring these items up (Peter, Ken, Katie, etc. ). We can also use this forum to publicly discuss changes that will benefit the Optim toolset and its users.

    ------------------------------
    Keith Tidball
    Progressive Insurance
    ------------------------------



  • 11.  RE: Optim Deletes

    Posted Thu January 16, 2020 10:18 AM
    @Phil Neff

    ​Hi Phil,

    Were you able to find a best option for delete? I will be running into this situation in near future and want to know the options. The tables I would like to delete has BLOBS in it (both parent and child tables) and have more than a million records

    Thanks




    ------------------------------
    Srinivas Adki
    ------------------------------



  • 12.  RE: Optim Deletes

    Posted Thu September 05, 2019 03:11 PM
    Hi Keith,

    I implemented your suggestion in our test environment with the "Union All" in the  SQL Select in the Optim GUI in AD. 
    I  ran into an issue with performance where the log spikes went up to 90% so I had to kill the process.  There were only 6,241 records with Json data. 
     
    My questions to you are  what do you set on these options under 'Commit Frequency' and 'Database Connections' in your environment?

    Thank you in advance,

    Diem



    ------------------------------
    Diem Nguyen
    ------------------------------



  • 13.  RE: Optim Deletes

    Posted Fri September 06, 2019 10:57 AM
    I'm not sure what you mean by 'log spikes'? And is this an issue while selecting the data or deleting it?

    We have never had an issue with executing this type of Select where the LOB column is nulled out as part of the Unioned subset. The top part of Union will select nothing (hence the WHERE 1=0 since that is never true). It is the lower part of the UNION ALL that is selecting all the data rows and nulling out the LOB column value. If your Select is having issues, there is probably something off in the syntax or the WHERE clause is not performant. 

    Commit Level has zero bearing on the Select. It applies to the Delete phase and we are usually anywhere from 500-5000 depending upon the performance of the Deletes (we do not want to hold locks for more than a second or two).

    DB connections is usually 1 unless we have slow performance and then we kick it up to 2 or 4. However, if you are just reading a single table, I'm not sure adding DB connections will improve that.

    ------------------------------
    Keith Tidball
    Progressive Insurance
    ------------------------------