February 13, 2012 at 9:41 am
I have a table that has accidently had 2 sets of the same data uploaded onto it. Unfortunately, there is no timestamp that can I can set a WHERE clause on to delete the extra set. What's the most efficient way to get rid of the extra data?
February 13, 2012 at 9:56 am
Some people use cursors, but I prefer the ROW_NUMBER solution:
;
WITH RowNumbers AS
(
SELECT columnA, columnB, columnC,
ROW_NUMBER() OVER (PARTITION BY columnA, columnB, columnC ORDER BY columnA, columnB, columnC) AS rn
FROM tableName
)
DELETE FROM RowNumbers WHERE rn > 1
Jared
CE - Microsoft
February 13, 2012 at 12:05 pm
Works great. Is there a good article about how the row_number() function works? I did a google search, but the pages that are listed don't really explain how the function does its job
February 13, 2012 at 12:14 pm
Did you see this one?http://msdn.microsoft.com/en-us/library/ms186734.aspx I think this is pretty good.
Jared
CE - Microsoft
February 13, 2012 at 7:32 pm
Now that the deletion of duplicates has been resolved, you might want to take a look at the structure of your table and figure out what would make a good primary key so that this doesn't happen in the future. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
February 13, 2012 at 8:09 pm
guerillaunit (2/13/2012)
What's the most efficient way to get rid of the extra data?
You can make the process more efficient still by using minimally-logged operations rather than the fully-logged DELETE statement:
CREATE TABLE #Test (columnA int NULL, columnB int NULL, columnC int NULL);
-- 100,000 rows
INSERT #Test
SELECT TOP (100000)
CHECKSUM(NEWID()),
CHECKSUM(NEWID()),
CHECKSUM(NEWID())
FROM
master.sys.columns AS c,
master.sys.columns AS c2,
master.sys.columns AS c3;
-- Duplicate every row
INSERT #Test
SELECT * FROM #Test AS t
-- Minimally-logged SELECT...INTO
-- Uses the internal ANY aggregate
SELECT
Numbered.columnA,
Numbered.columnB,
Numbered.columnC
INTO #DeDuplicated
FROM
(
SELECT *, rn =
ROW_NUMBER() OVER (
PARTITION BY columnA, columnB, columnC
ORDER BY columnA, columnB, columnC)
FROM #Test AS t
) AS Numbered
WHERE
Numbered.rn = 1;
-- Minimally-logged TRUNCATE TABLE
TRUNCATE TABLE #Test;
-- Metadata-only SWITCH
ALTER TABLE #DeDuplicated SWITCH TO #Test;
-- Finished with this
DROP TABLE #DeDuplicated
-- Show duplicates have been removed
CREATE UNIQUE CLUSTERED INDEX cuq ON #Test (columnA, columnB, columnC);
SELECT * FROM #Test AS t;
-- Tidy up
DROP TABLE #Test;
February 13, 2012 at 8:35 pm
Nicely done, Paul. Will that preserve keys, constraints, and indexes? And, yes, I realize the TRUNCATE won't work if FK's are present.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 13, 2012 at 9:41 pm
Jeff Moden (2/13/2012)
Nicely done, Paul. Will that preserve keys, constraints, and indexes? And, yes, I realize the TRUNCATE won't work if FK's are present.
One of the conditions for SWITCH is that things like keys, constraints and indexes must match, so in that sense, yes they are preserved. Given that the source table apparently has no primary key, I thought it reasonable to assume not much in the way of constraints or indexes existed either. Anyway, after such a large DELETE, one would probably want to rebuild all indexes anyway, and this could also be minimally-logged.
February 13, 2012 at 10:05 pm
SQL Kiwi (2/13/2012)
Jeff Moden (2/13/2012)
Nicely done, Paul. Will that preserve keys, constraints, and indexes? And, yes, I realize the TRUNCATE won't work if FK's are present.One of the conditions for SWITCH is that things like keys, constraints and indexes must match, so in that sense, yes they are preserved. Given that the source table apparently has no primary key, I thought it reasonable to assume not much in the way of constraints or indexes existed either. Anyway, after such a large DELETE, one would probably want to rebuild all indexes anyway, and this could also be minimally-logged.
No problems on this so far as the OP goes. I know there isn't even a PK. I was more interested in something I might take on at work. And, dammit, I did read that the tables had to have the same "gizmos" but plumb forgot. Thanks for the reminder.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply