June 30, 2009 at 9:34 am
I have a table that has 5 columns. Two of the columns, 'ObjectId' and 'DataKey' uniquely identify information, but are not used in a key or index of any kind. A third column is an identity column.
I want to be able to delete rows that have duplicate 'ObjectId' and 'DataKey' values, being sure to leave the row with the most recent (greatest) identity value. I can use this:
select distinct
ObjectId,
DataKey
from MyTable
to get only the non-duplicates, but how can I be sure to delete the duplicate rows leaving the one with the greatest identity value?
Thanks
June 30, 2009 at 9:53 am
delete
from MyTable
where exists (select * from MyTable t
where t.ObjectId=MyTable.ObjectId
and t.DataKey=MyTable.DataKey
and t.IdentCol>MyTable.IdentCol)
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537June 30, 2009 at 10:08 am
Thanks Mark,
I'm a newbie with SQL, so forgive the ignorance. I don't really understand why that query works. I ran just the 'select' portion to see if it returned the records that would be deleted - but it returned nothing. Can you explain what's happening in that query?
Thanks again.
June 30, 2009 at 10:16 am
If you run this
select *
from MyTable
where exists (select * from MyTable t
where t.ObjectId=MyTable.ObjectId
and t.DataKey=MyTable.DataKey
and t.IdentCol>MyTable.IdentCol)
it should show you the rows that are to be deleted. The query simply returns rows where there is at least one other (different) row of the same ObjectId and DataKey, but with a higher identity column value. If you're not getting anything, try posting some sample data.
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537June 30, 2009 at 10:22 am
OK, I think I've got it.
In my test to show the records that would be deleted, I ran only the select subquery:
select * from MyTable t
where t.ObjectId=MyTable.ObjectId
and t.DataKey=MyTable.DataKey
and t.IdentCol>MyTable.IdentCol
I didn't use the query:
select *
from MyTable
where exists (select * from MyTable t
where t.ObjectId=MyTable.ObjectId
and t.DataKey=MyTable.DataKey
and t.IdentCol>MyTable.IdentCol)
Thanks again.
June 30, 2009 at 2:18 pm
Little hint for future posts. Use "[code]" instead of "<code>" to post source code. Hml-tags will be miss-interpreted. 😉
July 1, 2009 at 3:12 am
DELETE MyTable FROM (SELECT IdentityColumn,DENSE_RANK() OVER(PARTITION BY ObjectId,DataKey ORDER BY NEWID()) 'Rank' FROM MyTable) MyTable1
WHERE MyTable.IdentityColumn =MyTable1.IdentityColumn
AND MyTable1.Rank > 1
July 1, 2009 at 6:58 am
Thanks Pandian,
I tried that query...it appears that it doesn't leave the record with the greatest identity value and remove the rest. I think the lowest identity value was the one kept and others were deleted.
July 1, 2009 at 7:05 am
I changed the query to
DELETE MyTable FROM
(SELECT IdentityColumn,DENSE_RANK() OVER(PARTITION BY ObjectId,DataKey ORDER BY IdentityColumn DESC) 'Rank'
FROM MyTable) MyTable1
WHERE MyTable.IdentityColumn = MyTable1.IdentityColumn
AND MyTable1.Rank > 1
and that appears to work.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply