May 19, 2009 at 5:06 am
There must be a set based answer to this ugly situation but my brain isn't finding it and time is running out. So I'm appealing for help.
There are some duplicates names in a table (without a primary key and 600,000 rows) which are distinguished by date. I need to delete all but the newest dated record for each person. Ie the person can become the primary key.
I can easily get a list of those names which are duplicated into a temporary table, but it's cycling through these names to get the MAX date for each one that's causing me to run to the WHILE loop. Please help me avoid the RBAR solution if possible. If it's not then do I use a
CREATE TABLE #Table1
(
NameOfSomeone nvarchar(100),
DateOfSomeone datetime
)
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Paul', '2009-01-01')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Paul', '2009-01-02')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Peter', '2009-01-03')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Peter', '2009-01-04')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Peter', '2009-01-07')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('James', '2009-01-05')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('James', '2009-01-06')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Amy', '2009-01-01')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Grace', '2009-01-01')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Jane', '2009-01-01')
May 19, 2009 at 5:27 am
There are many methods by which you can delete duplicate rows from one table. Here is one of the ways you can do it.
It can also be done using temporary table with IDENTITY column and inserting the data into this table in
sorted order as explained here
IF ( OBJECT_ID( 'tempdb..#Table1' ) IS NOT NULL )
DROP TABLE #Table1
CREATE TABLE #Table1
(
NameOfSomeone nvarchar(100),
DateOfSomeone datetime
)
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Paul', '2009-01-01')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Paul', '2009-01-02')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Peter', '2009-01-03')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Peter', '2009-01-04')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Peter', '2009-01-07')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('James', '2009-01-05')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('James', '2009-01-06')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Amy', '2009-01-01')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Grace', '2009-01-01')
INSERT INTO #Table1 (NameOfSomeone, DateOfSomeone) VALUES ('Jane', '2009-01-01')
DELETET1
FROM#Table1 T1
INNER JOIN
(
SELECTNameOfSomeone, MAX( DateOfSomeone ) AS DateOfSomeone
FROM#Table1
GROUP BY NameOfSomeone
) T2 ON T1.NameOfSomeone = T2.NameOfSomeoneAND T1.DateOfSomeone != T2.DateOfSomeone
SELECT * FROM #Table1
--Ramesh
May 19, 2009 at 6:12 am
---- Edit ----------------------------------------------------------
Sorry ignore this: I've just noticed its the SQL2000 group, and ROW_NUMBER() isn't available. But I've left it here for information only.
-------------------------------------------------------------------
Another de-duping method, using the ROW_NUMBER PARTITION BY feature.
WITH DUPS AS (
SELECT NameOfSomeone, DateOfSomeone, ROW_NUMBER() OVER (PARTITION BY NameOfSomeone ORDER BY DateOfSomeone DESC) AS RN
FROM #Table1
)
DELETE FROM #Table1
FROM #Table1 JOIN DUPS
ON #Table1.NameOfSomeone = DUPS.NameOfSomeone AND
#Table1.DateOfSomeone = DUPS.DateOfSomeone
WHERE DUPS.RN > 1
This creates A CTE with each name in a separate group (partition) - and the dates are numbered in descending sequence within each group, so you can delete all those with a row number greather than 1.
As with any potentially dangerous changes, try it out in a transaction with rollback first.
May 19, 2009 at 6:16 am
Tom Brown (5/19/2009)
Another de-duping method, using the ROW_NUMBER PARTITION BY feature.
It does work in 2K5 but we are in 2K forum. It happened to me many times in the past:-D
--Ramesh
May 19, 2009 at 6:42 am
A huge thankyou to all who replied. 😀 I was so busy looking at memory tables and then joining them with the original I lost sight of the root cause - cr*p data.
I've run the script by SSCrazy and it worked a treat! :w00t: The 3rd party's data is now in a fit state to be imported into live, the boss is happy and I can get to work on the next stages of integration.
Now where's the emoticon for a beer...
May 19, 2009 at 6:52 am
FNS (5/19/2009)
Now where's the emoticon for a beer...
Steve ... ?
We could do with a few more emoticons
these ones don't display properly :beer: :rocks: :thumbsup:
May 19, 2009 at 6:54 am
FNS (5/19/2009)
A huge thankyou to all who replied. 😀 I was so busy looking at memory tables and then joining them with the original I lost sight of the root cause - cr*p data.I've run the script by SSCrazy and it worked a treat! :w00t: The 3rd party's data is now in a fit state to be imported into live, the boss is happy and I can get to work on the next stages of integration.
Now where's the emoticon for a beer...
You are welcome, and I am glad that it worked out well for you. "BeerIcon", I wandered around for years but still couldn't find one:w00t::w00t:
BTW, I am not SSCrazy:hehe:
--Ramesh
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply