Remove Duplicates in Result as the Query Executes

  • Hello Everyone

    Hope that everyone had a great weekend.

    I am working on a very challenging project, that is pumping data from a very old DB2 database into SQL Server. There are many pitfalls with to work around.

    I am pumping data into a table variable from a large table, but I have noticed that before I can pump that data into a warehouse, I need to remove some rouge duplicate values, so I have set one more table variable up to be certain there are no duplicates before the data pump into the warehouse table. As you can see in the code below, there are a few duplicates appearing. I need to remove one of the two before pumping the data into the warehouse table. I have clearly commented the duplicate values. It does not matter what one is deleted, but I need to cleanse this data before the last data pump into the table.

    IF OBJECT_ID('TempDB..#MyTempTable','U') IS NOT NULL

    DROP TABLE #MyTempTable

    create table #MyTempTable

    (

    SetID int

    , SubSetID int

    , QuarterID varchar(5)

    )

    INSERT INTO #MyTempTable

    (

    SetID

    , SubSetID

    , QuarterID

    )

    SELECT 15,1,'Q4'

    UNION ALL SELECT 15,2,'Q4'

    UNION ALL SELECT 15,3,'Q5'

    UNION ALL SELECT 16,1,'Q4'

    UNION ALL SELECT 16,1,'Q4'-- Remove This One

    UNION ALL SELECT 16,3,'Q5'

    UNION ALL SELECT 17,1,'Q4'

    UNION ALL SELECT 17,2,'Q4'

    UNION ALL SELECT 17,2,'Q4'-- Remove This One

    UNION ALL SELECT 17,3,'Q4'

    UNION ALL SELECT 18,1,'Q4'

    UNION ALL SELECT 18,2,'Q5'

    UNION ALL SELECT 18,3,'Q4'

    UNION ALL SELECT 18,4,'Q5'

    UNION ALL SELECT 19,1,'Q4'

    UNION ALL SELECT 19,2,'Q4'

    UNION ALL SELECT 19,3,'Q4'

    UNION ALL SELECT 19,4,'Q5'

    UNION ALL SELECT 19,5,'Q5'

    UNION ALL SELECT 20,1,'Q5'

    UNION ALL SELECT 20,1,'Q5'-- Remove This One

    UNION ALL SELECT 20,2,'Q5'

    select * from #MyTempTable

    Thank You in advance for any and all assistance or comments

    Andrew SQLDBA

  • Run this on your temp table:

    with duplicates as (select duplicatecolumn1,duplicatecolumn2, row_number() over (partition by duplicatecolumn1, duplicatecolumn2 order by duplicatecolumn1, duplicatecolumn2) as rowno

    from tablename

    ) delete from duplicates where rowno > 1;

    Thanks,

    Jared

    Jared
    CE - Microsoft

  • Anything wrong with SELECT DISTINCT ...?

  • Ninja's_RGR'us (11/14/2011)


    Anything wrong with SELECT DISTINCT ...?

    Sometimes the simplest solutions evade me... 🙂

    Jared

    Jared
    CE - Microsoft

  • There are other columns involved, and the Distinct has already been used to query the data from the first table into the table variable. Unfortunately, none of the other columns are a date data type, or I would have used that to select the newest row.

    Sorry for not mentioning this in my first post, I did not even think to mention it.

    Andrew SQLDBA

  • Ninja's_RGR'us (11/14/2011)


    Anything wrong with SELECT DISTINCT ...?

    Been there, done that!

  • AndrewSQLDBA (11/14/2011)


    There are other columns involved, and the Distinct has already been used to query the data from the first table into the table variable. Unfortunately, none of the other columns are a date data type, or I would have used that to select the newest row.

    Sorry for not mentioning this in my first post, I did not even think to mention it.

    Andrew SQLDBA

    Then ROW_NUMBER is your best option.

  • Ok, I have added a ROW_Number AS refnumber to that table variable, how would I code this to use the MAX(refnumber) as the row to delete if there is a duplicate SETID, SubSetID and QuarterID?

    Thanks

    Andrew SQLDBA

  • Use rownumber in a derived table then filter on WHERE RN = 1

  • I am not getting it, would you mind giving an example please?

    It has been a rough monday already for me, and I am a little slow.

    Andrew SQLDBA

  • Using CTE instead of derived table.

    ;WITH CTE

    AS (

    SELECT

    object_id

    , ROW_NUMBER() OVER ( PARTITION BY object_id ORDER BY name ) AS RN

    FROM

    sys.all_columns

    )

    SELECT

    *

    FROM

    CTE

    WHERE

    RN = 1

  • Thanks Everyone, especially Ninja

    I have a couple big Ninja motorcycles, they are just wicked fast.

    Andrew SQLDBA

  • AndrewSQLDBA (11/14/2011)


    Thanks Everyone, especially Ninja

    I have a couple big Ninja motorcycles, they are just wicked fast.

    Andrew SQLDBA

    You're welcomed! 😉

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply