Remove duplicate entries from mysql table

Was writing some code, and dealt with this problem…so posting about solving this problem with 1 command (or more) depending on your decision.If you want to remove all duplicate rows with the same combination of N number of columns.

I’ll talk about two options in this post:

Add a unique key constraint that drops all duplicates

This one will remove all duplicates (based on the combination of columns you give) with one line.
The approach works by altering the table, and adding a unique key. Applying this unique key, causes all duplicate rows to be dropped leaving only one.

The ignore keyword plays a crucial role in this, as it allows duplicate rows to be ignored instead of throwing an error and rolling the process back. According to the mysql reference (link):

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.”


ALTER IGNORE TABLE table_original ADD UNIQUE KEY unique_filter(column1, column2, column3)
;
[alter table syntax ]

Note: There’s risk you will lose out data in any other columns not considered part of the unique key (in case of two rows with same unique column values but differing additional columns).
The constraint can be removed later once the filtering has been accomplished.

Use a temporary distinct table

This approach is to create a temporarily table containing all the distinct rows. Then we delete the original values, and copy the distinct back into the original table.

a) Create a temporary table with distinct rows

create table table_distinct as select distinct column1, column2, (only columns in combination) from table_original; [create table as select syntax] [ select syntax ]

Note: You may also have to alter the temp table, to add additional columns you do not consider part of the unique row fingerprint.
Before moving to step two, you will have to also backup these extra columns.

b) Delete all entries in original table not in distinct table

delete from table_original; [ delete syntax ]

c) Insert values from temp to original

insert in original_table (column1, column2, ...) Select * from table_distinct; [ insert with select syntax ]

That’s about…comments and suggestions welcome!



Menelaos Bakopoulos

Mr. Menelaos Bakopoulos is currently pursuing his PhD both at Center for TeleInFrastruktur (CTiF) at Aalborg University (AAU) in Denmark and Athens Information Technology (AIT) in Athens, Greece. He received a Master in Information Technology and Telecommunications Systems from Athens Information Technology and a B.Sc. in Computer Science & Management Information Systems from the American College of Thessaloniki. Since April 2008 he has been a member of the Multimedia, Knowledge, and Web Technologies Group.

More Posts