Informix

 View Only
Expand all | Collapse all

Adding primary to a table having duplicate rows

  • 1.  Adding primary to a table having duplicate rows

    Posted Tue June 30, 2020 10:02 AM
    *Problem:* I have a table which has multiple duplicate records since very beginning, now the table is in production and we cannot remove those duplicate records because they are partially duplicate. And we cannot also move those records to new table as we don't know in how many 4GLs program table have been used.

    *Goal:* Adding a primary key on the table having multiple duplicate rows

    *Observation:* I have read few solutions on Informix online manual that we can set *Primary Key* with *DISABLED* option
    Then we can start *VIOLATION* and set *ENABLED* on created Primary Key Constraints but using this option I got error says cannot set Primary Key because of duplicate records.
    Can u please tell what I am doing wrong?
    Or, their is no such concept existing in Informix that we can set Primary Key for a table having duplicate records from now onwards.

    ------------------------------
    Piyush Sachan
    CM/FGK
    ------------------------------

    #Informix


  • 2.  RE: Adding primary to a table having duplicate rows

    IBM Champion
    Posted Tue June 30, 2020 10:43 AM

    Why do you need a primary key if it will not be enforced?

    - Fix what creates the duplicates
    - Then fix the duplicates
    - Then add the primary key

    To avoid new duplicates consider creating a BEFORE INSERT/BEFORE UPDATE triggers which throws an error if a new duplicate tries to be created.

    ------------------------------
    David Williams
    ------------------------------



  • 3.  RE: Adding primary to a table having duplicate rows

    IBM Champion
    Posted Tue June 30, 2020 10:52 AM
    Piyush,

    The volutations statement will work for the primary key on insert or update, not when there is already data in the table. 

    From the manual. "When a constraint or unique index is in filtering mode, the statement succeeds and the database server enforces the constraint or the unique index requirement during an INSERT, DELETE, or UPDATE statement by writing the failed rows to the violations table associated with the target table. Diagnostic information about the constraint violation is written to the diagnostics table associated with the target table."

    This is an example:

    create table state2 (
    state char(2),
    state_name char(15)
    );
    insert into state2 select state, state_name from state;
    insert into state2 select state, state_name from state;
    start violations table for state2 ;

    -- This will fail because there is already data in the table
    create unique index state2_idx_pk on state2 (state ) filtering ;

    -- So copy the data to a temp table and delete it
    select * from state2 into temp state2_temp;
    delete from state2 where 1=1;

    -- Create the index with the empty table
    create unique index state2_idx_pk on state2 (state ) filtering ;

    -- This sill catch the duplicates in the state2_vol table
    insert into state2 select state, state_name from state2_temp;

    #Informix


  • 4.  RE: Adding primary to a table having duplicate rows

    IBM Champion
    Posted Tue June 30, 2020 10:55 AM
    You cannot have a primary key or unique key or even a UNIQUE index on a table with rows that have duplicate key column values. The only thing you can do is to remove the dups or change their key column values. Here are some ways to do that. Here's one:

    SELECT a.key_col1, a.key_col2, a.rowid
    FROM dups_table AS a, dupes_table AS b
    WHERE a.key_col1 = b.key_col1 
       AND a.key_col2 = b.key_col2
       AND a.rowid > b.rowid; 

    This will select all but the "first" row of a duplicated key set and you can delete those rows by rowid. If you need to be more selective, then change the rowid filter to != and add a DISTINCT clause to the projection list then you can select the entire row (or just add those columns you will use to decide which rows to keep and which to delete) to the projection.
     If the table is partitioned you cannot use ROWID unless the table was created WITH ROWID, but you can use "ifx_row_id" instead which is a string that contains the partition number and rowid (ex: "1048816:3585").

    Art

    .

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 5.  RE: Adding primary to a table having duplicate rows

    IBM Champion
    Posted Tue June 30, 2020 11:15 AM


    I would make sure there is a non-unique index on the key columns first.

    Then

    create new table with key cols from old table

    select  <keycols> from table group by <keycols> having count(*) > 1;

    Fix the rows with duplicates.


    All,

    I did have an RFE idea - a command to alter a non-unique index to become a unique index
    i.e. lock index, index scan for dups, if none found make index unique.

    Does that sounds useful?

    ------------------------------
    David Williams
    ------------------------------



  • 6.  RE: Adding primary to a table having duplicate rows

    IBM Champion
    Posted Tue June 30, 2020 01:04 PM
    David:

    Worth a try. HOwever, I'm not sure whether that conversion would not be a big deal to implement. Structurally a UNIQUE index is different from an index that was not declared as UNIQUE but that happens to have unique rows. A UNIQUE index places the one ROWID for each key in the leaf node of the index. A duplicate index places a pointer to a page containing rowids that contain that key into the leaf node. That makes a unique index more efficient to search than a duplicate index that happens to have unique keys in it.

    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 7.  RE: Adding primary to a table having duplicate rows

    Posted Tue June 30, 2020 01:55 PM

    another possibility is  to have create unique index stmts  default to a non-unique index if the uniqueness  fails. That way you get the benefit of having a defined index in place.

     

     






  • 8.  RE: Adding primary to a table having duplicate rows

    IBM Champion
    Posted Tue June 30, 2020 07:35 PM

    The problem is what return code would the statement have if the values are not unique?

    Admin tools may well assume negative return codes to be failure.

    On the other hand a positive value would not indicate the new index was not unique.

    I suppose something could be added to sqlca.sqlwarn but that may not be checked by tools.

    Perhaps add a new sql clause to ignore duplicates?

    Sybase
        http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc32300.1570/html/sqlug/X65321.htm
           ignore_dup_key -  added to create index statement, does not work with existing duplicate values
          ""any attempt to insert a duplicate key is canceled with an error message. After the cancellation, any transaction that was active at the time may continue as though the update or insert had never taken place.""

    SQL Server
         https://sqlperformance.com/2019/04/sql-performance/ignore_dup_key-slower-clustered-indexes
            WITH (IGNORE_DUP_KEY = ON) - Added to index or constraint definition
            "A warning message will occur when duplicate key values are inserted into a unique index. Only the rows violating the uniqueness constraint will fail."

    Oracle
        https://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements006.htm#CHDEGDDG
           "The IGNORE_ROW_ON_DUPKEY_INDEX hint applies only to single-table INSERT operations. It is not supported for UPDATE, DELETE, MERGE, or multitable insert operations....When a unique key violation is encountered, a row-level rollback occurs and execution resumes with the next input row."
     
    MySQl
        https://smallbusiness.chron.com/duplicate-ignore-mysql-inserts-28260.html
        https://dev.mysql.com/doc/refman/8.0/en/insert.html
        https://dev.mysql.com/doc/refman/8.0/en/replace.html
            Additional clauses on insert statement.
            "INSERT IGNORE will insert rows in the same way as INSERT, but with the exception that it will ignore rows with duplicate values and continue execution without creating an error."
            "INSERT ... ON DUPLICATE KEY UPDATE will insert any non-duplicate rows as normal. However, when it encounters a duplicate row, it will perform an UPDATE on the original row"
            "REPLACE works the same as INSERT, except that when it encounters a duplicate row, it deletes the original row and then continues with the insert. "   
           
          Vendors cannot agreed on how to do this!!

    ------------------------------
    David Williams
    ------------------------------



  • 9.  RE: Adding primary to a table having duplicate rows

    IBM Champion
    Posted Tue June 30, 2020 06:52 PM

    I thought there was something but I could not remember which it why I asked.

    No point as conversion would be pretty much the same as building a new unique index, you only really save the sort!




    ------------------------------
    David Williams
    ------------------------------



  • 10.  RE: Adding primary to a table having duplicate rows

    Posted Wed July 01, 2020 02:00 AM
    I don't think this is possible unless you remove the duplicate values from the table...
    And removing duplicate value is not that hard from a table... 


    ------------------------------
    Gaurav Kumar
    ------------------------------



  • 11.  RE: Adding primary to a table having duplicate rows

    Posted Wed July 01, 2020 12:20 PM

    I usually create a duplicate table with the primary key, fill it with the unique values and switch the table names around and (after I don't get complaints about missing data) drop the old table.  If you have the space to duplicate your original table, this allows you to make the fix with virtually no down time to the users.  You may have to scrape a few straggler records that were added to the old table while the new one was building, but that's not usually much of a problem.

     

                            --EEM

     






  • 12.  RE: Adding primary to a table having duplicate rows

    IBM Champion
    Posted Wed July 01, 2020 12:41 PM
    Everett:

    You can eliminate the "straggler records" issue this way:

    1. Rename the original table to _old.
    2. Create the new table as _new.
    3. Create a VIEW with the original name that selects from both tables with a UNION of two selects to eliminate dupes in the results.
    4. Add an INSERT ... INSTEAD OF trigger on the VIEW that inserts to the _new table only.
    5. Copy rows from the old table to the new one ignoring dups on the significant fields for determining duplicate status (the OP stated that the rows are not identical on all columns).
    6. Once all rows have been copied, you can drop the view, rename the _new table, and eventually drop the _old table.
    User affected downtime is a couple of seconds at each end.

    Art

    Art S. Kagel, President and Principal Consultant
    ASK Database Management


    Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference.  Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.








  • 13.  RE: Adding primary to a table having duplicate rows

    Posted Wed July 01, 2020 01:47 PM

    Nice.

     






  • 14.  RE: Adding primary to a table having duplicate rows

    Posted Wed July 01, 2020 06:19 PM
    Hi Piyush.

    When you say "duplicate records" I must assume you mean records that have the data in your intended "primary key" columns but the other data is actually different.  After all, if the rows were completely duplicated, the 4GL programs would not care at all which of the rows it actually gets.  I see Art gave a solution that helps you get rid of the secondary rows but I see you were reluctant to do that, since those rows are out there and in use.  And I might guess that there is a table out there that references up to your troublesome table as if with a foreign key, although no such actual object is possible in your case.

    Let's consider this idea:
    ALTER TABLE ADD (record_num integer) ;  -- Please forgive syntax if wrong.  Haven't done this in years!
    Then run a 1-time 4GL program to:
    • Open a fetch cursor for update
    • Update the new column with a running loop counter
    Create a unique index on the new column
    ALTER TABLE MODIFY (record_num serial);
    ALTER TABLE ADD PRIMARY KEY (record_num);

    If all that can work, voila! a new primary key.  And if a 4GL program fails to supply a value on an insert, I *think* it will give you the next value anyway.

    Speaking of 4GL programs, if you have coded variables LIKE table.* you will likely need to recompile every program that uses this table.

    It's been a long time since I've done this so some syntax and sequencing may be off.  But with the right tweaks, it should solve the problem.

    Good luck.

    -- Jacob S

    ------------------------------
    Jacob Salomon
    ---
    Nobody goes there anymore, it's too crowded.  --Attr: Yogi Berra
    ------------------------------