query to delete 'almost' duplicate rows

Question

Post reply

query to delete 'almost' duplicate rows

khalprin

Mr or Mrs. 500

Points: 576
More actions
June 30, 2009 at 9:34 am

#133894

I have a table that has 5 columns. Two of the columns, 'ObjectId' and 'DataKey' uniquely identify information, but are not used in a key or index of any kind. A third column is an identity column.
I want to be able to delete rows that have duplicate 'ObjectId' and 'DataKey' values, being sure to leave the row with the most recent (greatest) identity value. I can use this:
select distinct
ObjectId,
DataKey
from MyTable
to get only the non-duplicates, but how can I be sure to delete the duplicate rows leaving the one with the greatest identity value?
Thanks

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Mark Cowne One Orange Chip Points: 26952 More actions · Answer 1

delete

from MyTable

where exists (select * from MyTable t

where t.ObjectId=MyTable.ObjectId

and t.DataKey=MyTable.DataKey

and t.IdentCol>MyTable.IdentCol)

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

khalprin Mr or Mrs. 500 Points: 576 More actions · Answer 2

Thanks Mark,

I'm a newbie with SQL, so forgive the ignorance. I don't really understand why that query works. I ran just the 'select' portion to see if it returned the records that would be deleted - but it returned nothing. Can you explain what's happening in that query?

Thanks again.

Mark Cowne One Orange Chip Points: 26952 More actions · Answer 3

If you run this

select *

from MyTable

where exists (select * from MyTable t

where t.ObjectId=MyTable.ObjectId

and t.DataKey=MyTable.DataKey

and t.IdentCol>MyTable.IdentCol)

it should show you the rows that are to be deleted. The query simply returns rows where there is at least one other (different) row of the same ObjectId and DataKey, but with a higher identity column value. If you're not getting anything, try posting some sample data.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

khalprin Mr or Mrs. 500 Points: 576 More actions · Answer 4

OK, I think I've got it.

In my test to show the records that would be deleted, I ran only the select subquery:

select * from MyTable t

where t.ObjectId=MyTable.ObjectId

and t.DataKey=MyTable.DataKey

and t.IdentCol>MyTable.IdentCol

I didn't use the query:

select *

from MyTable

where exists (select * from MyTable t

where t.ObjectId=MyTable.ObjectId

and t.DataKey=MyTable.DataKey

and t.IdentCol>MyTable.IdentCol)

Thanks again.

Florian Reischl SSC-Dedicated Points: 37301 More actions · Answer 5

Little hint for future posts. Use "[code]" instead of "<code>" to post source code. Hml-tags will be miss-interpreted. 😉

Pandian S SSChasing Mays Points: 625 More actions · Answer 6

DELETE MyTable FROM (SELECT IdentityColumn,DENSE_RANK() OVER(PARTITION BY ObjectId,DataKey ORDER BY NEWID()) 'Rank' FROM MyTable) MyTable1

WHERE MyTable.IdentityColumn =MyTable1.IdentityColumn

AND MyTable1.Rank > 1

khalprin Mr or Mrs. 500 Points: 576 More actions · Answer 7

Thanks Pandian,

I tried that query...it appears that it doesn't leave the record with the greatest identity value and remove the rest. I think the lowest identity value was the one kept and others were deleted.

khalprin Mr or Mrs. 500 Points: 576 More actions · Answer 8

I changed the query to

DELETE MyTable FROM

(SELECT IdentityColumn,DENSE_RANK() OVER(PARTITION BY ObjectId,DataKey ORDER BY IdentityColumn DESC) 'Rank'

FROM MyTable) MyTable1

WHERE MyTable.IdentityColumn = MyTable1.IdentityColumn

AND MyTable1.Rank > 1

and that appears to work.