DElete Duplicates

  • Hi,

    I have duplicates in my table A

    VehicleID Timestamp CardNum CompanyName CANum Link CLink

    01073 2006-12-06 1 ABC 1234 1 22

    1073 2012-12-06 1 ABC 1234 2 22

    111 2012-12-06 2 ACC 1222 3 44

    1122 2012-12-06 3 ADD 1555 4 55

    01122 2006-12-06 3 ADD 1555 5 55

    Need to eliminate duplicates based on CardNum,CompanyName and CardAcctNum . If these fields are equal , then in the B table where ever we have those link that are dupes example 2,4 we need to update to 1 and 5

    based on CLink (22,55)

    Then delete the record with the new timestamp from Table A.

    Please help

  • Something like this should work for you:

    DECLARE @tableA TABLE

    (

    vehicleID VARCHAR(5),

    [timestamp] DATETIME,

    cardnum TINYINT,

    companyname VARCHAR(10),

    CANum INT,

    link TINYINT,

    clink INT

    )

    DECLARE @toDelete TABLE

    (

    vehicleIDToKeep VARCHAR(5),

    clinkToKeep INT,

    vehicleIDToDelete VARCHAR(5),

    clinkToDelete INT

    );

    INSERT INTO @tableA

    (vehicleID, timestamp, cardnum, companyname, CANum, link, clink)

    VALUES

    ('01073', '2006-12-06', 1, 'ABC', 1234, 1, 22),

    ('1073', '2012-12-06', 1, 'ABC', 1234, 2, 22),

    ('111', '2012-12-06', 2, 'ACC', 1222, 3, 44),

    ('1122', '2012-12-06', 3, 'ADD', 1555, 4, 55),

    ('01122', '2006-12-06', 3, 'ADD', 1555, 5, 55);

    WITH dupes

    AS (

    SELECT

    ROW_NUMBER() OVER (PARTITION BY cardnum, companyname, CANum ORDER BY vehicleID) AS row_num,

    *

    FROM

    @tableA

    )

    INSERT INTO @toDelete

    (

    vehicleIDToKeep,

    clinkToKeep,

    vehicleIDToDelete,

    clinkToDelete

    )

    SELECT

    D.vehicleID,

    D.clink,

    D2.vehicleID,

    D2.clink

    FROM

    dupes AS D

    JOIN dupes D2

    ON D.CANum = D2.CANum AND

    D.cardnum = D2.cardnum AND

    D.companyname = D2.companyname AND

    D.row_num + 1 = D2.row_num

    WHERE

    D.row_num = 1;

    UPDATE

    tableB

    SET

    clink = D.clinkToKeep

    FROM

    @toDelete AS D

    WHERE

    tableB.clink = D.clinkToDelete;

    DELETE FROM

    @tableA

    WHERE

    EXISTS ( SELECT

    1

    FROM

    @toDelete AS TD

    WHERE

    TD.vehicleIDToDelete = [@tableA].vehicleID )

    It would be easier if you provided full table definitions so we can test.

    Once you've cleaned the data up I recommend creating a Unique Constraint on those 3 columns so you don't have to do it again.

  • Thanks a lot. That solved my problem .

  • A little late perhaps but here's an alternate:

    DECLARE @tableA TABLE

    (

    vehicleID VARCHAR(5),

    [timestamp] DATETIME,

    cardnum TINYINT,

    companyname VARCHAR(10),

    CANum INT,

    link TINYINT,

    clink INT

    )

    DECLARE @toDelete TABLE

    (

    vehicleID VARCHAR(5),

    [timestamp] DATETIME,

    cardnum TINYINT,

    companyname VARCHAR(10),

    CANum INT,

    link TINYINT,

    clink INT

    );

    INSERT INTO @tableA

    SELECT '01073' as vehicleID, '2006-12-06' as [timestamp], 1 as cardnum

    , 'ABC' as companyname, 1234 as CANum, 1 as link, 22 as clink

    UNION ALL SELECT '1073', '2012-12-06', 1, 'ABC', 1234, 2, 22

    UNION ALL SELECT '111', '2012-12-06', 2, 'ACC', 1222, 3, 44

    UNION ALL SELECT '1122', '2012-12-06', 3, 'ADD', 1555, 4, 55

    UNION ALL SELECT '01122', '2006-12-06', 3, 'ADD', 1555, 5, 55

    SELECT *

    FROM @tableA

    DELETE FROM a

    OUTPUT Deleted.*

    INTO @toDelete

    FROM @tableA a

    INNER JOIN (

    SELECT CAST(vehicleID AS INT) as vehicleID, cardnum, companyname, MIN([timestamp]) as [timestamp]

    FROM @tableA

    GROUP BY CAST(vehicleID AS INT), cardnum, companyname

    HAVING COUNT(*) > 1

    ) x

    ON a.vehicleID = x.vehicleID and a.cardnum = x.cardnum and a.companyname = x.companyname and

    a.[timestamp] = x.[timestamp]

    UPDATE a

    SET link = x.link

    FROM @tableA a

    INNER JOIN @toDelete x

    ON CAST(a.vehicleID AS INT) = x.vehicleID and a.cardnum = x.cardnum and a.companyname = x.companyname

    SELECT *

    FROM @tableA


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Yup that works and I considered the conversion to INT as well, but that means you are assuming that the vehicleid is always the same with a a 0 or 0's pre-pended to it. I didn't think that was something should be assumed.

    A good solution though. I do like how you used the output clause and eliminated the extra insert my solution has. Wish I'd have thought of that.

  • I was actually thinking that you may be able to do the UPDATE first and then DELETE using a similar structure to what I did and avoid the OUTPUT and table variable (holding the deleted records) entirely.

    Had to run to a meeting so didn't have time to test that approach.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply