Informix

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

View Only

Back to discussions

Expand all | Collapse all

Adding primary to a table having duplicate rows

1. Adding primary to a table having duplicate rows

Like
Piyush Sachan
Posted Tue June 30, 2020 10:02 AM

Reply
*Problem:* I have a table which has multiple duplicate records since very beginning, now the table is in production and we cannot remove those duplicate records because they are partially duplicate. And we cannot also move those records to new table as we don't know in how many 4GLs program table have been used.

*Goal:* Adding a primary key on the table having multiple duplicate rows

*Observation:* I have read few solutions on Informix online manual that we can set *Primary Key* with *DISABLED* option
Then we can start *VIOLATION* and set *ENABLED* on created Primary Key Constraints but using this option I got error says cannot set Primary Key because of duplicate records.
Can u please tell what I am doing wrong?
Or, their is no such concept existing in Informix that we can set Primary Key for a table having duplicate records from now onwards.

------------------------------
Piyush Sachan
CM/FGK
------------------------------

#Informix
2. RE: Adding primary to a table having duplicate rows

Like
David Williams

IBM Champion
Posted Tue June 30, 2020 10:43 AM

Reply
Why do you need a primary key if it will not be enforced?

- Fix what creates the duplicates
- Then fix the duplicates
- Then add the primary key

To avoid new duplicates consider creating a BEFORE INSERT/BEFORE UPDATE triggers which throws an error if a new duplicate tries to be created.

------------------------------
David Williams
------------------------------

Original Message
3. RE: Adding primary to a table having duplicate rows

Like
Lester Knutsen

IBM Champion
Posted Tue June 30, 2020 10:52 AM

Reply
Piyush,

The volutations statement will work for the primary key on insert or update, not when there is already data in the table.

From the manual. "When a constraint or unique index is in filtering mode, the statement succeeds and the database server enforces the constraint or the unique index requirement during an INSERT, DELETE, or UPDATE statement by writing the failed rows to the violations table associated with the target table. Diagnostic information about the constraint violation is written to the diagnostics table associated with the target table."

This is an example:

create table state2 (
state char(2),
state_name char(15)
);
insert into state2 select state, state_name from state;
insert into state2 select state, state_name from state;
start violations table for state2 ;

-- This will fail because there is already data in the table
create unique index state2_idx_pk on state2 (state ) filtering ;

-- So copy the data to a temp table and delete it
select * from state2 into temp state2_temp;
delete from state2 where 1=1;

-- Create the index with the empty table
create unique index state2_idx_pk on state2 (state ) filtering ;

-- This sill catch the duplicates in the state2_vol table
insert into state2 select state, state_name from state2_temp;

#Informix
4. RE: Adding primary to a table having duplicate rows

Like
Art Kagel

IBM Champion
Posted Tue June 30, 2020 10:55 AM

Reply
You cannot have a primary key or unique key or even a UNIQUE index on a table with rows that have duplicate key column values. The only thing you can do is to remove the dups or change their key column values. Here are some ways to do that. Here's one:

SELECT a.key_col1, a.key_col2, a.rowid
FROM dups_table AS a, dupes_table AS b
WHERE a.key_col1 = b.key_col1
AND a.key_col2 = b.key_col2
AND a.rowid > b.rowid;

This will select all but the "first" row of a duplicated key set and you can delete those rows by rowid. If you need to be more selective, then change the rowid filter to != and add a DISTINCT clause to the projection list then you can select the entire row (or just add those columns you will use to decide which rows to keep and which to delete) to the projection.
If the table is partitioned you cannot use ROWID unless the table was created WITH ROWID, but you can use "ifx_row_id" instead which is a string that contains the partition number and rowid (ex: "1048816:3585").

Art

.

Art S. Kagel, President and Principal Consultant
ASK Database Management
www.askdbmgt.com

Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.

Original Message
5. RE: Adding primary to a table having duplicate rows

Like
David Williams

IBM Champion
Posted Tue June 30, 2020 11:15 AM

Reply
I would make sure there is a non-unique index on the key columns first.

Then

create new table with key cols from old table

select <keycols> from table group by <keycols> having count(*) > 1;

Fix the rows with duplicates.

All,

I did have an RFE idea - a command to alter a non-unique index to become a unique index
i.e. lock index, index scan for dups, if none found make index unique.

Does that sounds useful?

------------------------------
David Williams
------------------------------

Original Message
6. RE: Adding primary to a table having duplicate rows

Like
Art Kagel

IBM Champion
Posted Tue June 30, 2020 01:04 PM

Reply
David:

Worth a try. HOwever, I'm not sure whether that conversion would not be a big deal to implement. Structurally a UNIQUE index is different from an index that was not declared as UNIQUE but that happens to have unique rows. A UNIQUE index places the one ROWID for each key in the leaf node of the index. A duplicate index places a pointer to a page containing rowids that contain that key into the leaf node. That makes a unique index more efficient to search than a duplicate index that happens to have unique keys in it.

Art

Art S. Kagel, President and Principal Consultant
ASK Database Management
www.askdbmgt.com

Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.

Original Message
7. RE: Adding primary to a table having duplicate rows

Like
Mark Jalkiewicz
Posted Tue June 30, 2020 01:55 PM

Reply
another possibility is to have create unique index stmts default to a non-unique index if the uniqueness fails. That way you get the benefit of having a defined index in place.

Original Message
8. RE: Adding primary to a table having duplicate rows

Like
David Williams

IBM Champion
Posted Tue June 30, 2020 07:35 PM

Reply
The problem is what return code would the statement have if the values are not unique?

Admin tools may well assume negative return codes to be failure.

On the other hand a positive value would not indicate the new index was not unique.

I suppose something could be added to sqlca.sqlwarn but that may not be checked by tools.

Perhaps add a new sql clause to ignore duplicates?

Sybase
    http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc32300.1570/html/sqlug/X65321.htm
       ignore_dup_key - added to create index statement, does not work with existing duplicate values
      ""any attempt to insert a duplicate key is canceled with an error message. After the cancellation, any transaction that was active at the time may continue as though the update or insert had never taken place.""

SQL Server
     https://sqlperformance.com/2019/04/sql-performance/ignore_dup_key-slower-clustered-indexes
        WITH (IGNORE_DUP_KEY = ON) - Added to index or constraint definition
        "A warning message will occur when duplicate key values are inserted into a unique index. Only the rows violating the uniqueness constraint will fail."

Oracle
    https://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements006.htm#CHDEGDDG
       "The IGNORE_ROW_ON_DUPKEY_INDEX hint applies only to single-table INSERT operations. It is not supported for UPDATE, DELETE, MERGE, or multitable insert operations....When a unique key violation is encountered, a row-level rollback occurs and execution resumes with the next input row."

MySQl
    https://smallbusiness.chron.com/duplicate-ignore-mysql-inserts-28260.html
    https://dev.mysql.com/doc/refman/8.0/en/insert.html
    https://dev.mysql.com/doc/refman/8.0/en/replace.html
        Additional clauses on insert statement.
        "INSERT IGNORE will insert rows in the same way as INSERT, but with the exception that it will ignore rows with duplicate values and continue execution without creating an error."
        "INSERT ... ON DUPLICATE KEY UPDATE will insert any non-duplicate rows as normal. However, when it encounters a duplicate row, it will perform an UPDATE on the original row"
    "REPLACE works the same as INSERT, except that when it encounters a duplicate row, it deletes the original row and then continues with the insert. "

      Vendors cannot agreed on how to do this!!

------------------------------
David Williams
------------------------------

Original Message
9. RE: Adding primary to a table having duplicate rows

Like
David Williams

IBM Champion
Posted Tue June 30, 2020 06:52 PM

Reply
I thought there was something but I could not remember which it why I asked.

No point as conversion would be pretty much the same as building a new unique index, you only really save the sort!

------------------------------
David Williams
------------------------------

Original Message
10. RE: Adding primary to a table having duplicate rows

Like
Gaurav Kumar
Posted Wed July 01, 2020 02:00 AM

Reply
I don't think this is possible unless you remove the duplicate values from the table...
And removing duplicate value is not that hard from a table...

------------------------------
Gaurav Kumar
------------------------------

Original Message
11. RE: Adding primary to a table having duplicate rows

Like
Everett Mills
Posted Wed July 01, 2020 12:20 PM

Reply
I usually create a duplicate table with the primary key, fill it with the unique values and switch the table names around and (after I don't get complaints about missing data) drop the old table. If you have the space to duplicate your original table, this allows you to make the fix with virtually no down time to the users. You may have to scrape a few straggler records that were added to the old table while the new one was building, but that's not usually much of a problem.

--EEM

Original Message
12. RE: Adding primary to a table having duplicate rows

Like
Art Kagel

IBM Champion
Posted Wed July 01, 2020 12:41 PM

Reply
Everett:

You can eliminate the "straggler records" issue this way:

Rename the original table to _old.
Create the new table as _new.
Create a VIEW with the original name that selects from both tables with a UNION of two selects to eliminate dupes in the results.
Add an INSERT ... INSTEAD OF trigger on the VIEW that inserts to the _new table only.
Copy rows from the old table to the new one ignoring dups on the significant fields for determining duplicate status (the OP stated that the rows are not identical on all columns).
Once all rows have been copied, you can drop the view, rename the _new table, and eventually drop the _old table.
User affected downtime is a couple of seconds at each end.

Art

Art S. Kagel, President and Principal Consultant
ASK Database Management
www.askdbmgt.com

Blog: http://informix-myview.blogspot.com/

Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on the IIUG, nor any other organization with which I am associated either explicitly, implicitly, or by inference. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.

Original Message
13. RE: Adding primary to a table having duplicate rows

Like
Everett Mills
Posted Wed July 01, 2020 01:47 PM

Reply
Nice.

Original Message
14. RE: Adding primary to a table having duplicate rows

Like
Jacob Salomon
Posted Wed July 01, 2020 06:19 PM

Reply
Hi Piyush.

When you say "duplicate records" I must assume you mean records that have the data in your intended "primary key" columns but the other data is actually different. After all, if the rows were completely duplicated, the 4GL programs would not care at all which of the rows it actually gets. I see Art gave a solution that helps you get rid of the secondary rows but I see you were reluctant to do that, since those rows are out there and in use. And I might guess that there is a table out there that references up to your troublesome table as if with a foreign key, although no such actual object is possible in your case.

Let's consider this idea:
ALTER TABLE ADD (record_num integer) ; -- Please forgive syntax if wrong. Haven't done this in years!
Then run a 1-time 4GL program to:

Open a fetch cursor for update

Update the new column with a running loop counter

Create a unique index on the new column
ALTER TABLE MODIFY (record_num serial);
ALTER TABLE ADD PRIMARY KEY (record_num);

If all that can work, voila! a new primary key. And if a 4GL program fails to supply a value on an insert, I *think* it will give you the next value anyway.

Speaking of 4GL programs, if you have coded variables LIKE table.* you will likely need to recompile every program that uses this table.

It's been a long time since I've done this so some syntax and sequencing may be off. But with the right tweaks, it should solve the problem.

Good luck.

-- Jacob S

------------------------------
Jacob Salomon
---
Nobody goes there anymore, it's too crowded. --Attr: Yogi Berra
------------------------------

Original Message

Informix

Informix

Adding primary to a table having duplicate rows

Piyush SachanTue June 30, 2020 10:02 AM

David WilliamsTue June 30, 2020 10:43 AM

Lester KnutsenTue June 30, 2020 10:52 AM

Art KagelTue June 30, 2020 10:55 AM

David WilliamsTue June 30, 2020 11:15 AM

Art KagelTue June 30, 2020 01:04 PM

Mark JalkiewiczTue June 30, 2020 01:55 PM

David WilliamsTue June 30, 2020 07:35 PM

David WilliamsTue June 30, 2020 06:52 PM

Gaurav KumarWed July 01, 2020 02:00 AM

Everett MillsWed July 01, 2020 12:20 PM

Art KagelWed July 01, 2020 12:41 PM

Everett MillsWed July 01, 2020 01:47 PM

Jacob SalomonWed July 01, 2020 06:19 PM

1. Adding primary to a table having duplicate rows

2. RE: Adding primary to a table having duplicate rows

3. RE: Adding primary to a table having duplicate rows

4. RE: Adding primary to a table having duplicate rows

5. RE: Adding primary to a table having duplicate rows

6. RE: Adding primary to a table having duplicate rows

7. RE: Adding primary to a table having duplicate rows

8. RE: Adding primary to a table having duplicate rows

9. RE: Adding primary to a table having duplicate rows

10. RE: Adding primary to a table having duplicate rows

11. RE: Adding primary to a table having duplicate rows

12. RE: Adding primary to a table having duplicate rows

13. RE: Adding primary to a table having duplicate rows

14. RE: Adding primary to a table having duplicate rows

Additional
Resources

Office

Quick Links

Informix

Informix

Adding primary to a table having duplicate rows

Piyush SachanTue June 30, 2020 10:02 AM

David WilliamsTue June 30, 2020 10:43 AM

Lester KnutsenTue June 30, 2020 10:52 AM

Art KagelTue June 30, 2020 10:55 AM

David WilliamsTue June 30, 2020 11:15 AM

Art KagelTue June 30, 2020 01:04 PM

Mark JalkiewiczTue June 30, 2020 01:55 PM

David WilliamsTue June 30, 2020 07:35 PM

David WilliamsTue June 30, 2020 06:52 PM

Gaurav KumarWed July 01, 2020 02:00 AM

Everett MillsWed July 01, 2020 12:20 PM

Art KagelWed July 01, 2020 12:41 PM

Everett MillsWed July 01, 2020 01:47 PM

Jacob SalomonWed July 01, 2020 06:19 PM

1. Adding primary to a table having duplicate rows

2. RE: Adding primary to a table having duplicate rows

3. RE: Adding primary to a table having duplicate rows

4. RE: Adding primary to a table having duplicate rows

5. RE: Adding primary to a table having duplicate rows

6. RE: Adding primary to a table having duplicate rows

7. RE: Adding primary to a table having duplicate rows

8. RE: Adding primary to a table having duplicate rows

9. RE: Adding primary to a table having duplicate rows

10. RE: Adding primary to a table having duplicate rows

11. RE: Adding primary to a table having duplicate rows

12. RE: Adding primary to a table having duplicate rows

13. RE: Adding primary to a table having duplicate rows

14. RE: Adding primary to a table having duplicate rows

Additional Resources

Office

Quick Links

Additional
Resources