How to delete duplicated rows in SQL Server 2008

  • Hi all,

    I have the followings rows:

    ID VALUE1 VALUE2 TYPE

    1 1 3 1

    2 15 10 4

    3 10 15 7

    4 10 15 7

    5 15 10 8

    6 3 2 12

    In my case, i consider duplicated rows if:

    - VALUE1 and VALUE2 are equals in other row (example: rows 3 and 4)

    - VALUE1 is equal to VALUE2 and VALUE2 is equal to VALUE1 in other row (example: rows 2 and 3)

    So, i will need a script to execute in SQL Server 2008 to delete this duplicated rows, to get:

    ID VALUE1 VALUE2 TYPE

    1 1 3 1

    2 15 10 4

    6 3 2 12

    Thanks in advance.

  • One way to do it is to use a self-join, and if you had provided the script to create your data, I could give you the desired query. Anyway, it would look like this:

    DELETE t1

    FROM dbo.YourTable t1

    JOIN dbo.YourTable t2

    ON t1.c2 = t2.c2

    AND t1.c3 = t2.c3

    AND t1.c1 > t2.c1

  • Going by your logic, shouldnt row 2 be deleted also?

  • ColdCoffee (3/5/2012)


    Going by your logic, shouldnt row 2 be deleted also?

    No, i need to get the duplicated row once.

    thank yo.

  • This is one way...but almost certainly not the best way....works ok on a million rows

    BEGIN TRAN ; -- just for repeatability

    CREATE TABLE #DATA(ID INT, VALUE1 INT,VALUE2 INT,TYPE INT);

    INSERT #DATA(ID,VALUE1,VALUE2,TYPE)

    SELECT 1, 1, 3, 1 UNION ALL

    SELECT 2, 15, 10, 4 UNION ALL

    SELECT 3, 10, 15, 7 UNION ALL

    SELECT 4, 10, 15, 7 UNION ALL

    SELECT 5, 15, 10, 8 UNION ALL

    SELECT 6, 3, 2, 12 ;

    -- add a million test rows

    INSERT #DATA(ID,VALUE1,VALUE2,TYPE)

    SELECT TOP 1000000 6+ROW_NUMBER() OVER (ORDER BY @@SPID),ABS(CHECKSUM(NEWID()))%100,ABS(CHECKSUM(NEWID()))%100,0

    FROM syscolumns a , syscolumns b ;

    -- This is the part that does the de-duping...

    ;WITH data_cte AS

    (

    SELECT ID,VALUE1,VALUE2,TYPE,ROW_NUMBER() OVER(PARTITION BY V1,V2 ORDER BY ID) AS rownum

    FROM #DATA

    CROSS APPLY (

    SELECT

    CASE WHEN VALUE1<VALUE2 THEN VALUE1 ELSE VALUE2 END

    ,CASE WHEN VALUE1<VALUE2 THEN VALUE2 ELSE VALUE1 END

    ) AS ca(V1,V2)

    )

    DELETE

    FROM data_cte

    WHERE rownum <> 1 ;

    SELECT ID,VALUE1,VALUE2,TYPE

    FROM #DATA ;

    ROLLBACK ;

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (3/5/2012)


    This is one way...but almost certainly not the best way....works ok on a million rows

    BEGIN TRAN ; -- just for repeatability

    CREATE TABLE #DATA(ID INT, VALUE1 INT,VALUE2 INT,TYPE INT);

    INSERT #DATA(ID,VALUE1,VALUE2,TYPE)

    SELECT 1, 1, 3, 1 UNION ALL

    SELECT 2, 15, 10, 4 UNION ALL

    SELECT 3, 10, 15, 7 UNION ALL

    SELECT 4, 10, 15, 7 UNION ALL

    SELECT 5, 15, 10, 8 UNION ALL

    SELECT 6, 3, 2, 12 ;

    -- add a million test rows

    INSERT #DATA(ID,VALUE1,VALUE2,TYPE)

    SELECT TOP 1000000 6+ROW_NUMBER() OVER (ORDER BY @@SPID),ABS(CHECKSUM(NEWID()))%100,ABS(CHECKSUM(NEWID()))%100,0

    FROM syscolumns a , syscolumns b ;

    -- This is the part that does the de-duping...

    ;WITH data_cte AS

    (

    SELECT ID,VALUE1,VALUE2,TYPE,ROW_NUMBER() OVER(PARTITION BY V1,V2 ORDER BY ID) AS rownum

    FROM #DATA

    CROSS APPLY (

    SELECT

    CASE WHEN VALUE1<VALUE2 THEN VALUE1 ELSE VALUE2 END

    ,CASE WHEN VALUE1<VALUE2 THEN VALUE2 ELSE VALUE1 END

    ) AS ca(V1,V2)

    )

    DELETE

    FROM data_cte

    WHERE rownum <> 1 ;

    SELECT ID,VALUE1,VALUE2,TYPE

    FROM #DATA ;

    ROLLBACK ;

    Hi mister.magoo!,

    I have tested this example and it solves my problem.

    Thank you very much for your help.

  • Viewing 6 posts - 1 through 5 (of 5 total)

    You must be logged in to reply to this topic. Login to reply