January 30, 2007 at 7:24 pm
This table has two fields. AccountId and MyDateTime. I'd like to delete any duplicate accounts and keep the most recent only.
Before
11AA 01/01/2007
11AA 01/02/2007
11AA 01/03/2007
22BB 01/01/2007
22BB 01/02/2007
33CC 01/01/2007
33CC 01/02/2007
33CC 01/03/2007
33CC 01/04/2007
After
11AA 01/03/2007
22BB 01/02/2007
33CC 01/04/2007
Thanks!!
January 30, 2007 at 9:10 pm
--===== Create a test table to demo the problem with...
SELECT '11AA' AS AccountID, CAST('01/01/2007' AS DATETIME) AS MyDateTime
INTO #MyHead UNION ALL
SELECT '11AA','01/02/2007' UNION ALL
SELECT '11AA','01/03/2007' UNION ALL
SELECT '22BB','01/01/2007' UNION ALL
SELECT '22BB','01/02/2007' UNION ALL
SELECT '33CC','01/01/2007' UNION ALL
SELECT '33CC','01/02/2007' UNION ALL
SELECT '33CC','01/03/2007' UNION ALL
SELECT '33CC','01/04/2007'
--===== Show the "before" condition
SELECT * FROM #MyHead
--===== Delete all but the lastest row for each AccountID
DELETE t1
FROM #MyHead t1,
#MyHead t2
WHERE t1.AccountID = t2.AccountID
AND t1.MyDateTime < t2.MyDateTime
--===== Show the "after" condition
SELECT * FROM #MyHead
--===== Housekeeping after demo
DROP TABLE #MyHead
--Jeff Moden
Change is inevitable... Change for the better is not.
April 16, 2007 at 3:32 am
DELETEt1
FROM(
SELECTROW_NUMBER() OVER (PARTITION BY AccountNo ORDER BY MyDateTime DESC) AS RecID
FROMTable1
) AS t1
WHERERecID > 1
N 56°04'39.16"
E 12°55'05.25"
April 16, 2007 at 3:58 am
Wow...2 excellent and elegant solutions, Peter and Jeff. (And not a max in sight!) Can't wait to see what else is proposed.
April 16, 2007 at 4:10 am
DELETE t1
FROM yourtable t1
inner join (select AccountID , max(MyDateTime  as MaxDateTime
FROM yourtable
group by AccountID ) t2
on t1.AccountID = t2.AccountID
AND t1.MyDateTime < t2.MaxDateTime
btw if you want to avoid lockescallation, you may want to put this delete in a while-loup and restrict the number of rows to process per loop using the SET ROWCOUNT xyz (=obsolete )
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
April 16, 2007 at 4:13 am
Well, Jeff has a CROSS JOIN, so there is a little performance penalty.
N 56°04'39.16"
E 12°55'05.25"
April 16, 2007 at 4:28 am
Is it really a CROSS JOIN? Or does the WHERE clause not basically act like a join?
April 16, 2007 at 4:47 am
Yes, you get the same execution plan.
But still, you have to do a JOIN which means you get an extra, depending on index, index seek/scan or table scan.
N 56°04'39.16"
E 12°55'05.25"
April 16, 2007 at 5:26 am
This is true....but I still like it!
When I'm next confronted with this type of situation, I'll have a hard time choosing between these two methods.
Previously, I'd have gone for a solution like that proposed by ALZDBA, using MAX - but I've never been 100% comfortable with it - a niggling voice always telling me that there's got to be a nicer way. And today I've found two! It's like my birthday
April 16, 2007 at 5:35 am
choosing will not be difficult at all ... compare execution plans !
If they are equal : pick one
If they are not equal, the choice is obvious
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
April 16, 2007 at 6:21 am
Oh, but then you need to learn to read execution plans, right?
N 56°04'39.16"
E 12°55'05.25"
April 16, 2007 at 6:37 am
Isn't that one of the goals of this kind of forums ?
No only to be creative, but being creative in an reasonable way
To learn through play
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
April 16, 2007 at 6:43 am
You've provoked me into replying, Peter
I don't see it in such simple terms (just compare execution plans)
I don't think all code has to be optimal.
The best code is not always the fastest code.
The best car is not always the fastest car.
I like blue cars where you can take the top down.
More seriously, of course, there are times when you need to find the fastest solution, but there are many, many situations where it really doesn't make any significant difference, and you can choose the solution that pleases you most. You can choose a solution, just because you like it.
There are also considerations (often forgotten) like maybe someone else has got to maintain your code - better to write code that can be understood, even if it means sacrificing a few milliseconds.
April 16, 2007 at 5:37 pm
Actually, it's not a cross join... I will agree that it creates many tiny triangular joins, but it creates no full cross joins. And, it deletes a quarter million dupes from a million row table in about 11 seconds... maybe not the fastest but I don't have SQL Server 2005... so I can't test Peter's method... however, ALZDBA is correct... we should test the performance of both methods some place where no harm will come... ipso-facto... a test table in temp db... please share the results of any testing you may do...
My test (below) was done on a 1.8 GHz single processor box using SQL Server 2000 Developer's Edition (sp4) with 1 GB Ram and IDE hard drives... Run times are posted in the code and in the results...
--===== Create and populate a million row test table (takes about 37 seconds)
-- Column "SomeDate" has a range of >=01/01/2000 <01/01/2010
-- That's ten years worth of dates.
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeID = CAST(RAND(CAST(NEWID() AS VARBINARY))*2000000+1 AS INT),
SomeValue = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))
+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),
SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),
SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)
INTO #BigTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
--===== Add a token primary key to simulate a real table (takes about 4 seconds)
ALTER TABLE #BigTest
ADD PRIMARY KEY CLUSTERED (RowNum)
--===== Start a "timer"
DECLARE @StartTime DATETIME
SET @StartTime = GETDATE()
--===== Delete all but the lastest row for each SomeID according to SomeDate (takes about 11 seconds)
DELETE t1
FROM #BigTest t1,
#BigTest t2
WHERE t1.SomeID = t2.SomeID
AND t1.SomeDate < t2.SomeDate
--===== Display the duration of the DELETE
PRINT CONVERT(VARCHAR(30),GETDATE()-@StartTime,114)
DROP TABLE #BigTest
...and, contrary to the belief of many, the best code is also usually the fastest code... typically the fastest code uses the least number of resources for the shortest period of time. What do you think happens with long running code? It's NOT just sitting there... it's consuming some form of resource 100% of the time it's active and not being blocked.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 16, 2007 at 11:17 pm
I know your method is fast!
But there is one caveat.
If there are duplicate records over SomeID and SomeDate, there will be duplicates left in the table.
N 56°04'39.16"
E 12°55'05.25"
Viewing 15 posts - 1 through 15 (of 23 total)
You must be logged in to reply to this topic. Login to reply