I have about 112K records in a table that are duplicates and the table has no identity column. This table has 39 million records. I have to delete all the duplicate records. What's the best way to accomplish this? This is what I have done so far
I ran this command to find dup records and I saw 112,527 records.
Select date, TradeID, Count(TradeID)
from Centric
Where date > '2023-06-24'
Group by date, TradeID
Having count(TradeID) > 1
Order by date desc
After that, I dumped all the dup data into a temp table but I see only 111922 records were inserted. I thought I'd have 112,527 records.
Select *
Into Centric_Temp
from Centric
Where date > '2023-06-24'
Group by date, TradeID, and all 40 columns
Having count(TradeID) > 1
Order by date desc
Now the plan is to delete dup records from the main table
Insert into centric (all columns) select distinct TradeID, all columns from Centric_Temp.
Is that the correct action? I just want to make sure I get this done right.
Any advise is highly appreciated.
"He who learns for the sake of haughtiness, dies ignorant. He who learns only to talk, rather than to act, dies a hyprocite. He who learns for the mere sake of debating, dies irreligious. He who learns only to accumulate wealth, dies an atheist. And he who learns for the sake of action, dies a mystic."[/i]
BACKUP DATABASE TO DISK = 'file path' WITH COPY_ONLY,CHECKSUM;
WITH CTE AS (
SELECT
[tradeId]
,[date]
,other columns that uniqueify the record AS[otherColumns]
,ROW_NUMBER() OVER (
PARTITION BY [tradeId],[date],[otherColumns] ORDER BY [tradeId])
AS [row]
FROM
[dbo].[Centric]
)
DELETE FROM
[CTE]
WHERE [row] > 1;
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply