Oct 19 2012
Remove duplicate entries from mysql table
Was writing some code, and dealt with this problem…so posting about solving this problem with 1 command (or more) depending on your decision.If you want to remove all duplicate rows with the same combination of N number of columns.
I’ll talk about two options in this post:
Add a unique key constraint that drops all duplicates
This one will remove all duplicates (based on the combination of columns you give) with one line.
The approach works by altering the table, and adding a unique key. Applying this unique key, causes all duplicate rows to be dropped leaving only one.
The ignore keyword plays a crucial role in this, as it allows duplicate rows to be ignored instead of throwing an error and rolling the process back. According to the mysql reference (link):
“IGNORE
is a MySQL extension to standard SQL. It controls how ALTER TABLE
works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE
is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE
is specified, only the first row is used of rows with duplicates on a unique key, The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.”
[alter table syntax ]
;
ALTER IGNORE TABLE table_original ADD UNIQUE KEY unique_filter(column1, column2, column3)
Note: There’s risk you will lose out data in any other columns not considered part of the unique key (in case of two rows with same unique column values but differing additional columns).
The constraint can be removed later once the filtering has been accomplished.
Use a temporary distinct table
This approach is to create a temporarily table containing all the distinct rows. Then we delete the original values, and copy the distinct back into the original table.
a) Create a temporary table with distinct rows
create table table_distinct as select distinct column1, column2, (only columns in combination) from table_original;
[create table as select syntax] [ select syntax ]
Note: You may also have to alter the temp table, to add additional columns you do not consider part of the unique row fingerprint.
Before moving to step two, you will have to also backup these extra columns.
b) Delete all entries in original table not in distinct table
delete from table_original;
[ delete syntax ]
c) Insert values from temp to original
insert in original_table (column1, column2, ...) Select * from table_distinct;
[ insert with select syntax ]
That’s about…comments and suggestions welcome!