Removing duplicate records from a table

  • I want to Remove the Duplicate Records from a table without using any temporary table. Table structure is

    CREATE TABLE TABLEDUPLICATE (UNIQUEID INT,USERID INT ,INTRESTED BIT,EVENTID int)

    INSERT INTO TABLEDUPLICATE VALUES (1,101,0,1)

    INSERT INTO TABLEDUPLICATE VALUES (2,101,1,1)

    INSERT INTO TABLEDUPLICATE VALUES (3,101,1,1)

    INSERT INTO TABLEDUPLICATE VALUES (4,101,0,1)

    INSERT INTO TABLEDUPLICATE VALUES (5,102,0,1)

    INSERT INTO TABLEDUPLICATE VALUES (6,102,1,1)

    INSERT INTO TABLEDUPLICATE VALUES (7,102,1,1)

    INSERT INTO TABLEDUPLICATE VALUES (8,102,0,1)

    INSERT INTO TABLEDUPLICATE VALUES (1,101,0,1)

    INSERT INTO TABLEDUPLICATE VALUES (2,101,1,1)

    INSERT INTO TABLEDUPLICATE VALUES (3,101,1,1)

    INSERT INTO TABLEDUPLICATE VALUES (4,101,0,1)

    INSERT INTO TABLEDUPLICATE VALUES (5,102,0,1)

    INSERT INTO TABLEDUPLICATE VALUES (6,102,1,1)

    INSERT INTO TABLEDUPLICATE VALUES (7,102,1,1)

    INSERT INTO TABLEDUPLICATE VALUES (8,102,0,1)

    INSERT INTO TABLEDUPLICATE VALUES (9,101,0,2)

    INSERT INTO TABLEDUPLICATE VALUES (10,101,1,2)

    INSERT INTO TABLEDUPLICATE VALUES (11,101,1,2)

    INSERT INTO TABLEDUPLICATE VALUES (12,101,0,2)

    INSERT INTO TABLEDUPLICATE VALUES (13,102,0,2)

    INSERT INTO TABLEDUPLICATE VALUES (14,102,1,2)

    INSERT INTO TABLEDUPLICATE VALUES (15,102,1,2)

    INSERT INTO TABLEDUPLICATE VALUES (16,102,0,2)

    THIS WILL INSERT DUPLICATE VALUES like for a particular event there will be multiple entries for a single users

    I want Single Entry for each user in each event and want to retain that particular entry which is having intrested=1

    So data should be like this in table after deletion of duplicate records:

    2,101,1,1

    6,102,1,1

    10,101,1,2

    14,102,1,2

    Please help in writing the query as there are about 10000 records in table and i want to remove the duplicates.

  • First, with the sample data you've provided, your UniqueID field isn't unique. So, I'm assuming we first remove those first 8 inserts.

    Then, I'd use the Row_Number() function to assign priority to your list based on matches on userid and eventid, putting the intrested ones first, followed by the uniqueid as a tiebreaker. So, I'd use the following:

    select uniqueid,

    ROW_NUMBER()

    over (partition by userid, eventid order by intrested desc, uniqueid) as rn

    from tableduplicate

    Now, you only want those with RN (rownumber) = 1, so I'd put the above in a CTE and select like this:

    ;with IDList as

    (select uniqueid,

    ROW_NUMBER()

    over (partition by userid, eventid order by intrested desc, uniqueid) as rn

    from tableduplicate)

    select * from Tableduplicate TD JOIN IDList IL on TD.UniqueID = IL.uniqueid

    where rn=1

    And delete the duplicates like this:

    ;with IDList as

    (select uniqueid,

    ROW_NUMBER()

    over (partition by userid, eventid order by intrested desc, uniqueid) as rn

    from tableduplicate)

    delete from Tableduplicate

    where Uniqueid in

    (select uniqueid from IDList IL where rn > 1)

    Rob Schripsema
    Propack, Inc.

  • Rob Schripsema (5/26/2011)

    And delete the duplicates like this:

    ;with IDList as

    (select uniqueid,

    ROW_NUMBER()

    over (partition by userid, eventid order by intrested desc, uniqueid) as rn

    from tableduplicate)

    delete from Tableduplicate

    where Uniqueid in

    (select uniqueid from IDList IL where rn > 1)

    hi this can be done like this also, no need of where condition to be joined with main table (TABLEDUPLICATE) .You can just delete CTE which will reflect in main table.

    Select * from TABLEDUPLICATE

    ;with IDList as

    (select uniqueid,

    ROW_NUMBER()

    over (partition by userid, eventid order by intrested desc, uniqueid) as rn

    from TABLEDUPLICATE)

    delete from IDList where rn > 1

    Select * from TABLEDUPLICATE

    Thanks
    Parthi

  • parthi-1705 (5/26/2011)


    hi this can be done like this also, no need of where condition to be joined with main table (TABLEDUPLICATE) .You can just delete CTE which will reflect in main table.

    Select * from TABLEDUPLICATE

    ;with IDList as

    (select uniqueid,

    ROW_NUMBER()

    over (partition by userid, eventid order by intrested desc, uniqueid) as rn

    from TABLEDUPLICATE)

    delete from IDList where rn > 1

    Select * from TABLEDUPLICATE

    Wow. I did not realize the CTE maintained some sort of link to the source table. Always something new to learn...:crazy:

    Thanks for the pointer, Parthi!

    Rob Schripsema
    Propack, Inc.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply