October 8, 2005 at 8:04 pm
hi
i have two rows with redundant data and i would like to delete only top record .Please let me know the best approach to do it .
thanks
sa
October 9, 2005 at 12:33 am
The only way you can do it is if your table has a unique primary key for each duplicate.
delete mytable
Where PK in (select min(a.pk)
From mytable a
join mytable b on a.dupcol = b.dupcol
where a.pk <> b.pk)
or if you do not have a unique primay key you need to
select distinct into temp table
truncate table
insert rows from temp table
October 10, 2005 at 2:39 am
There are few good article on this subject in this site. search for them. You will find them usefull.
My Blog:
October 10, 2005 at 6:37 am
Try DELETE .... WHERE CURRENT OF ...cursor. See Positioned Delete in BOL. This will allow you to delete a single row from a set of tue duplicates.
If you can identify the row to delete with a WHERE clause, it would probably be better to use DELETE ... WHERE than to use a cursor. However, for true duplicates you need a positioned delete.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
October 10, 2005 at 6:54 am
Actualy, if you do a search for "DELETE DUPLICATES", you come up with just a couple of articles which use cursors and group bys and adding columns and using MAX/GROUP BY... and ... and... and... none of them appear to be straight forward.
So here's a straight forward, scalable approach (as a generic boilerplate). On my humble home computer, it will find and delete 400 duplicate rows in a million record table in less than a minute.
First, the disclaimer:
THIS SCRIPT DELETES DATA! DO NOT TEST ON PRODUCTION DATA! DO NOT USE ON PRODUCTION DATA UNLESS YOU HAVE A VIABLE AND RESTORABLE BACKUP FOR YOUR DATA. I AM NOT AND CANNOT BE MADE TO BE RESPONSIBLE FOR YOUR DATA IN ANY WAY, SHAPE, OR FORM!
DELETE
FROM tablename
WHERE tablename.ID IN
--The following SELECT statement acts as a derived table. The DISTINCT term is
--required because if there are 3 or more dupes for any given fields, the return
--will grow, exponentially.
(
SELECT DISTINCT t1.ID
FROM tablename AS t1
INNER JOIN tablename AS t2
-- These "ON" conditions contain the comparisons of the fields to be checked for dupes
ON t1.dupchkfield1 = t2.dupchkfield1
AND t1.dupchkfield2 = t2.dupchkfield2
--AND t1.dupchkfield3 = t2.dupchkfield3 --Add additional fields like this
--AND t1.dupchkfield4 = t2.dupchkfield4 --Add additional fields like this
-- This "ON" condition contains the comparison of the field to be checked for age.
-- "Age" can be based on date/datetime/time or auto-incrementing ID.
-- In all cases, the earlier/lower the value, the older the record is compared to others.
-- The comparison operator (">" in this example) determines which duplicate will be kept.
-- The functionality of this example can be changed by changing the comparison operator to...
-- ">" KEEPS only the OLDEST or EARLIEST record if duplicates are detected (OLD IS GREATer)
-- "<" KEEPS only the YOUNGEST or LATEST record if duplicates are detected (YOUNG is LESS-THAN OLD)
-- "<>" REMOVES ALL DUPLICATED RECORDS INCLUDING YOUNGEST AND OLDEST (be careful)
AND t1.chronologyfield > t2.chronologyfield
_________________________________________________________________________________________________________________
--Jeff Moden
Change is inevitable... Change for the better is not.
October 10, 2005 at 9:19 am
Another way, if you don't have a primary key and don't want to use a cursor or group by:
SET ROWCOUNT 1
DELETE My Table WHERE field1 = 'DUPE'
SET ROWCOUNT 0
Dylan Peters
SQL Server DBA
October 10, 2005 at 10:13 am
--I have found that the following script works well
/* If your table is called "TheTable" and three fields "F1", "F2" and "F3" */
/*This needs to be done within a transaction. Just in case something goes wrong*/
BEGIN TRAN
/* Save Distinct Versions of the Duplicate Rows */
SELECT DISTINCT F1, F2, F3
INTO #TheTable
FROM TheTable T1
WHERE EXISTS (SELECT F1, F2, F3, Count(*)
FROM TheTable T2
WHERE T2.F1 = T1.F1
AND T2.F2 = T1.F2
AND T2.F3 = T1.F3
GROUP BY F1, F2, F3
HAVING Count(*) > 1)
/* DELETE the Duplicate Rows */
DELETE T1
FROM TheTable T1
WHERE EXISTS (SELECT F1, F2, F3, Count(*)
FROM TheTable T2
WHERE T2.F1 = T1.F1
AND T2.F2 = T1.F2
AND T2.F3 = T1.F3
GROUP BY F1, F2, F3
HAVING Count(*) > 1)
/* Restore one the saved Rows */
INSERT INTO TheTable (F1, F2, F3)
SELECT F1, F2, F3
FROM #TheTable
DROP TABLE #TheTable
COMMIT TRAN
--Hope this helps
--Robert
October 11, 2005 at 1:40 am
See what I mean?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 11, 2005 at 3:03 am
Hi Jeff,
I tried your example on table which had three fields "F1", "F2" and "F3" with no primary or unique key. I could not get it to work. Whereas I did with mine. I must be doing something wrong. Could you please enpart some more of your wisdom and suggest where I might be using your simpler approach incorrectly.
Cheers
Robert
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply