January 30, 2006 at 5:35 pm
Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/chawkins/dedupingdatainsqlserver2005.asp
February 7, 2006 at 3:37 am
When you are populating the record set, take out the GO between the next-to-the-last and the last insert statements. Having this penultimate GO in the set of queries will remove the scope of the @NOW variable and cause the last INSERT to fail.
February 7, 2006 at 4:03 am
nice example
Here is another :
WITH
cteEmployeeOrderedByMyRowNumber AS
(SELECT ROW_NUMBER() OVER (ORDER BY EMPID ASC, REFDATE ASC) AS MyRowNumber
, Row_NUMBER() Over (Partition By EMPID,FNAME,LNAME Order By REFDATE ASC) as PartitionRank
, *
FROM EMPLOYEE
-- WHERE 1 = 1
)
DELETE FROM cteEmployeeOrderedByMyRowNumber
where PartitionRank > 1 ;
just to get it in the tips of the fingers
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
February 7, 2006 at 8:11 am
Hi...
I usually clean up my "mess" with "something" like this
delete from myTable where myID NOT in
(select min(MyID) from myTable group by myUniqueField)
But I think the best application for CTE are recursive querries...
February 7, 2006 at 9:24 am
Is there a performance gain to using this function and technique? Or is it just one of those "hey cool function --- let's use it"?
aries
February 7, 2006 at 1:14 pm
great examples!
If only duplicates need to be removed the ROW_NUMBER() may not be needed.
WITH cteEmployeeOrderedByMyRank AS
(SELECT RANK() OVER (PARTITION BY EMPID,FNAME,LNAME ORDER BY REFDATE ASC) AS PartitionRank
, *
FROM EMPLOYEE
-- WHERE 1 = 1
)
DELETE FROM cteEmployeeOrderedByMyRank
WHERE PartitionRank > 1 ;
It surely seems to be much faster than the cursor based apporach.
bm21
February 7, 2006 at 4:30 pm
Interesting demonstration of the ROW_NUMBER() function. Please say it ain’t so, Joe! - that you are not using cursors to remove duplicate rows. Even the technique of a SELECT DISTINCT into a temporary table would be a better option. As other readers have commented, there are a number of ways to remove duplicate rows. This would be my approach:
DELETE Employee
FROM Employee a INNER JOIN (SELECT Empid,
FName,
LName,
MIN(RefDate) AS 'MinDate'
FROM Employee
GROUP BY Empid, FName, LName) b
ON a.Empid = b.Empid
AND a.FName = b.FName
AND a.LName = b.LName
AND a.RefDate > b.MinDate
This would still leave the issue of James verses Jim that would need to be resolved separately. If you didn’t care about spelling variations and wanted to assume that the first entry was the correct one then this would work:
DELETE Employee
FROM Employee a INNER JOIN (SELECT Empid,
MIN(RefDate) AS 'MinDate'
FROM Employee
GROUP BY Empid) b
ON a.Empid = b.Empid
AND a.RefDate > b.MinDate
I would be interested in the question of performance between the two techniques but I’d put my money on mine which I suspect has a whole lot less overhead even as a cross join than having the engine generate a row position.
February 7, 2007 at 4:45 am
Another option (that works with SQL 2000) and even in cases of all the columns having the same value (no column to differentiate the rows), is inserting the result set into a new table with an identity column (or adding an identity column to the original table). After that, it's just a matter of keeping the distinct rows as shown in the comments.
I can't remember where I read this solution, but it was either here in SQL Server Central or SQL Team foruns.
André Cardoso
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply