July 23, 2003 at 2:56 pm
Hi sql gurus,
I can write a simple code to find out the duplicated rows, but I don't know how to delete all duplicated rows only leave one row there.
select <column name>, count(column name) as ' # of duplicates'
from <table name>
group by <column name>
having (count(column name) > 1)
order by <column name>
Please help. Thanks.
Robert
July 23, 2003 at 3:04 pm
http://support.microsoft.com/default.aspx?scid=kb;en-us;70956
http://www.sql-server-performance.com/rd_delete_duplicates.asp
Edited by - Allen_Cui on 07/23/2003 3:04:40 PM
July 23, 2003 at 4:10 pm
Here is another article that might help:
http://www.databasejournal.com/features/mssql/article.php/2235081
Gregory Larsen, DBA
If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples
Gregory A. Larsen, MVP
July 23, 2003 at 4:37 pm
Allen and Greg,
Thank you guys very much.
These scripts all work! Great!!
Robert
July 23, 2003 at 8:23 pm
How about something like this - would'nt this do the trick as well:
delete from table_name
where (Pk_value )
not in
( select min(pk_value)
from table_name group by PK_Value)
July 24, 2003 at 1:41 am
Unfortunately not.
The pk_value will be the same in the duplicate rows.
Because min(pk_value) will return the same value for each row in the duplicates, you'll end up deleting all the rows.
example:
Field1, Field2 pk_value
0 0 1
0 0 1
min(pk_value) will return 1.
This query will delete the whole table.
July 24, 2003 at 2:06 am
You can try to set rowcount to the 'number of row to be deleted -1' like this:
set rowcount # of duplicates - 1
delete from table where key_column=key_value
set rowcount 0
(0 means unlimited)
The delete statement will affect only # of duplicates - 1 rows
Bye,
Pepito
July 24, 2003 at 7:53 am
If you have a smallish number of rows (in total) in the table then just select distinct all columns into a new table, truncate the old table and then insert them all back from the new table to the old table.
If you have a large table then select distinct only the rows that are dup (group by having count(*) > 1 clause) into a new table. Delete from the old table using the "from from" format (join table you are deleting to the new table) where all columns match. (sorry, typed to fast here and forgot to put the distinct rows back) and remember to insert the rows from the new table back into the old table.
Either method will give the result of only having 1 occurance of a row regardless of how many duplicates there were initially for a row.
Edited by - songmeister on 07/24/2003 08:52:44 AM
July 24, 2003 at 8:32 am
Here's one that we recently did in the real world:
CREATE PROCEDURE spGMSRMRemovePODetailDupes
AS
select count(*) as NumberOfDupes,
max(grsinterfacePODetailsID) as MaxIDNumber,
RequestNo as ReqNo,
StatusCode as StatCode,
StatusDate as StatDate
into #TempFindDupes
from GRSInterfacePODetails
group by RequestNo, StatusCode, StatusDate
having count(*) > 1
/* For debugging */
/*
select * from #TempFindDupes
select * from GRSInterfacePODetails
inner join #TempFindDupes
on RequestNo = ReqNo
and StatusCode = StatCode
and StatusDate = StatDate
where GRSInterfacePODetailsID not in (select MaxIDNumber from #TempFindDupes)
*/
delete from GRSInterfacePODetails
from GRSInterfacePODetails inner join #TempFindDupes
on RequestNo = ReqNo
and StatusCode = StatCode
and StatusDate = StatDate
where GRSInterfacePODetailsID not in (select MaxIDNumber from #TempFindDupes)
drop table #TempFindDupes
Of course you'll have to substitute your own tables and columns!
Dave Orkis
July 24, 2003 at 10:05 am
songmeister's method has always worked for me. quick n' dirty, true, but does the job. don't forget to add constraints that won't allow these dupes going forward.
July 24, 2003 at 11:16 pm
OK, if this data is REALLY completely duplicated, then the articles will help you get rid of them. And like geomon said, make sure this doesn't happen in the future.
But, if these values aren't truly duplicate then the solution is much easier. Say you have duplicate IDs with a different "CreationDate" in Table1
Delete t1
From Table1 t1 (nolock)
JOIN Table1 t2 (nolock) on t1.ID = t2.ID and t2.CreationDate < t1.CreationDate
Join the table to itself on ID and only delete records where the CreationDate < max(CreationDate), essentially all your dups.
Pretty sweet!..Saw this piece of code on this very site and I've used it a few times already.
Signature is NULL
July 30, 2003 at 6:57 am
I've used songmeister's technique in the past and some of the articles are rather complexed. You may try this:
Create a PK for the table with an identity seed.
DELETE FROM tblTest
WHERE (intPK NOT IN
(SELECT MIN(intPK) AS Expr1
FROM tblTest
GROUP BY intFld1, intFld2, intFld3))
Then, delete the PK and set intFld1 to PK.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply