Remove almost Duplicate Records

  • I have a table where I need to be able to delete records that are almost identical however with a slight difference and the Date_Added column is what really is the trigger with keeping the newest date and not the older date. My data is below. So what I need is for 80429 and 80439 to be deleted because 89296 and 89267 are newer. Any ideas?

    AUTOIDvendorStockEA_UOMNew_Thomas_CostNew Proposed List PriceDateAddedExportedDate_Exported

    89296VN0001763319B38EA2.645.8510/20/2016 7:421/1/1900 0:00

    89297VN0001763319B52EA4.19.0910/20/2016 7:421/1/1900 0:00

    80429VN0001763319B38EA2.695.9610/19/2016 15:561/1/1900 0:00

    80439VN0001763319B52EA4.29.310/19/2016 15:561/1/1900 0:00

  • You can do something like this. What it does is delete all rows where there are matching rows having the same key columns but a greater DateAdded. You'll need to tweak the where clause so that it matches on the appropriate key columns.

    delete from MyTable as A

    where exists

    (

    select 1 from MyTable as B

    -- match on key columns:

    where B.vendor = A.vendor and B.stock = A.stock and B.EA_UOM = A.EA_UOM

    -- match on row with greater DateAdded:

    and B.DateAdded > A.DateAdded

    );

    Also, after you complete this deletion of duplicate rows, consider creating a primary key on the key columns to prevent further insertion of duplicates.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • So I hope I'm not making this too complicated but here is exactly what I would need. As you can see, AUTOID 63646 and 63647 are ok because even though the information is the same, the start_date is different and the older one has already been exported so I can't lose that record.

    As for AUTOID 89294, 89295, and 80429, 89294 would need to stay because the start_date is different then 89295, and 80429. However 80429 would need to go because 89295 has the same start_date but was Added after 80429 was added.

    Does this make sense?

    AUTOIDvendorStockNew_Thomas_CostNew Proposed List PriceStart_dateDateAddedExportedDate_Exported

    63647VN0031411182Q64135.05000000191.840000002016-10-15 00:00:00.0002016-10-19 15:55:54.360Y2016-10-19 15:59:17.167

    63646VN0031411182Q64135.05000000191.840000002017-01-01 00:00:00.0002016-10-19 15:55:54.3601900-01-01 00:00:00.000

    63645VN0031411182Q7654.7500000077.770000002016-10-15 00:00:00.0002016-10-19 15:55:54.360Y2016-10-19 15:59:17.167

    63644VN0031411182Q7654.7500000077.770000002017-01-01 00:00:00.0002016-10-19 15:55:54.3601900-01-01 00:00:00.000

    63625VN0031411182Q7754.7500000077.770000002016-10-15 00:00:00.0002016-10-19 15:55:54.360Y2016-10-19 15:59:17.167

    63624VN0031411182Q7754.7500000077.770000002017-01-01 00:00:00.0002016-10-19 15:55:54.3601900-01-01 00:00:00.000

    63650VN0031411182Q7854.7500000077.770000002016-10-15 00:00:00.0002016-10-19 15:55:54.360Y2016-10-19 15:59:17.167

    63649VN0031411182Q7854.7500000077.770000002017-01-01 00:00:00.0002016-10-19 15:55:54.3601900-01-01 00:00:00.000

    63652VN0031411182Q7954.7500000077.770000002016-10-15 00:00:00.0002016-10-19 15:55:54.360Y2016-10-19 15:59:17.167

    63651VN0031411182Q7954.7500000077.770000002017-01-01 00:00:00.0002016-10-19 15:55:54.3601900-01-01 00:00:00.000

    89294VN0001763319B382.630000005.850000002016-10-21 00:00:00.0002016-10-20 09:05:44.7001900-01-01 00:00:00.000

    89295VN0001763319B382.640000005.850000002017-01-01 00:00:00.0002016-10-20 09:05:44.7001900-01-01 00:00:00.000

    80429VN0001763319B382.690000005.960000002017-01-01 00:00:00.0002016-10-19 15:56:24.2131900-01-01 00:00:00.000

    89296VN0001763319B524.090000009.090000002016-10-21 00:00:00.0002016-10-20 09:05:44.7001900-01-01 00:00:00.000

    89297VN0001763319B524.100000009.090000002017-01-01 00:00:00.0002016-10-20 09:05:44.7001900-01-01 00:00:00.000

    80439VN0001763319B524.200000009.300000002017-01-01 00:00:00.0002016-10-19 15:56:24.2131900-01-01 00:00:00.000

  • First a quick note, please read up on how to post questions on the forums as your data example has few problems, took longer to sort those than writing the actual solution!

    😎

    Simplest approach in this case is to use the row_number function, partition the data on the "key" elements and sort on the decisive date in descending order, here is a quick example:

    USE TEEST;

    GO

    SET NOCOUNT ON;

    ;WITH SAMPLE_DATA AS

    (

    SELECT

    AUTOID,vendor,Stock,EA_UOM,New_Thomas_Cost,New_Proposed_List_Price ,DateAdded,Date_Exported

    FROM (VALUES

    (89296 ,'VN000176','3319B38','EA',2.64 , 5.85,'10/20/2016 7:42 ','1/1/1900 0:00' )

    ,(89297 ,'VN000176','3319B52','EA', 4.1 , 9.09 ,'10/20/2016 7:42 ','1/1/1900 0:00' )

    ,(80429 ,'VN000176','3319B38','EA',2.69 , 5.96,'10/19/2016 15:56 ',' 1/1/1900 0:00')

    ,(80439 ,'VN000176','3319B52','EA', 4.2 , 9.3 ,'10/19/2016 15:56 ',' 1/1/1900 0:00')

    )X(AUTOID,vendor,Stock,EA_UOM,New_Thomas_Cost,New_Proposed_List_Price ,DateAdded,Date_Exported)

    )

    ,BASE_DATA AS

    (

    SELECT

    SD.AUTOID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY SD.vendor

    ,SD.Stock

    ORDER BY SD.DateAdded DESC

    )AS DESC_RID

    ,SD.vendor

    ,SD.Stock

    ,SD.EA_UOM

    ,SD.New_Thomas_Cost

    ,SD.New_Proposed_List_Price

    ,SD.DateAdded

    ,SD.Date_Exported

    FROM SAMPLE_DATA SD

    )

    SELECT *

    -- CHANGE TO DELETE

    --DELETE BD

    FROM BASE_DATA BD

    WHERE BD.DESC_RID > 1;

    Output (what is to be deleted)

    AUTOID DESC_RID vendor Stock EA_UOM New_Thomas_Cost New_Proposed_List_Price DateAdded Date_Exported

    ----------- -------------------- -------- ------- ------ --------------------------------------- --------------------------------------- ----------------- --------------

    80429 2 VN000176 3319B38 EA 2.69 5.96 10/19/2016 15:56 1/1/1900 0:00

    80439 2 VN000176 3319B52 EA 4.20 9.30 10/19/2016 15:56 1/1/1900 0:00

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply