Delete duplicate data

  • I have about 112K records in a table that are duplicates and the table has no identity column. This table has 39 million records. I have to delete all the duplicate records. What's the best way to accomplish this? This is what I have done so far

    I ran this command to find dup records and I saw 112,527 records.

    Select date, TradeID, Count(TradeID)
    from Centric
    Where date > '2023-06-24'
    Group by date, TradeID
    Having count(TradeID) > 1
    Order by date desc

    After that, I dumped all the dup data into a temp table but I see only 111922 records were inserted. I thought I'd have 112,527 records.

    Select *
    Into Centric_Temp
    from Centric
    Where date > '2023-06-24'
    Group by date, TradeID, and all 40 columns
    Having count(TradeID) > 1
    Order by date desc

    Now the plan is to delete dup records from the main table

    Insert into centric (all columns) select distinct TradeID, all columns from Centric_Temp.

    Is that the correct action? I just want to make sure I get this done right.

    Any advise is highly appreciated.

    "He who learns for the sake of haughtiness, dies ignorant. He who learns only to talk, rather than to act, dies a hyprocite. He who learns for the mere sake of debating, dies irreligious. He who learns only to accumulate wealth, dies an atheist. And he who learns for the sake of action, dies a mystic."[/i]

  • BACKUP DATABASE TO DISK = 'file path' WITH COPY_ONLY,CHECKSUM;

    WITH CTE AS (
    SELECT
    [tradeId]
    ,[date]
    ,other columns that uniqueify the record AS[otherColumns]
    ,ROW_NUMBER() OVER (
    PARTITION BY [tradeId],[date],[otherColumns] ORDER BY [tradeId])
    AS [row]

    FROM
    [dbo].[Centric]
    )

    DELETE FROM
    [CTE]
    WHERE [row] > 1;

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply