deleting duplicate records

  • I am trying to delete duplicate records in a table. Please see below for what the records look like:

    Load Number Event Code Event Description

    123456 34 order placed

    123457 22 back order

    123456 34 order placed

    123458 11 discontinued item

    123457 22 back order

    I have had trouble finding a query to just remove the duplicates. Everything is driven off the Load Number in this table.

    Thanks as always.:-D

  • Hi,

    have you any criteria for which record to keep?

    Like is there a date field you haven't mentioned and you want to keep the most recent record, or do you just want to remove the dupe, i.e. they are true dupes with no fields different?

    Thanks

    Allister

  • The best method I know for deduping looks like this:

    ;with CTE as

    (select row_number() over (partition by [Load Number], [Event Code] order by 1) as Row

    from dbo.MyTable)

    delete from CTE

    where Row > 1;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Cool, that's pretty 🙂

  • ///Edit - wrong thread

  • I just want to keep only unique Load Numbers.

    current rows (duplicates)

    Load number Event Code Event Desc

    123456 11 on back order

    123457 22 discontinued item

    123456 11 on back order

    123458 33 shipped

    I only want

    desired rows (distinct)

    Load number Event Code Event Desc

    123456 11 on back order

    123457 22 discontinued item

    123458 33 shipped

  • Just change the row_number() partition and order by in my query to suit your needs.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared,

    I had to update CTE slightly to get it to work on my system?

    ;with CTE as

    (select row_number() over

    (partition by [Load Number], [Event Code], [Event Description]

    order by (select 1)) as Row

    from @temp)

    delete from CTE

    where Row > 1;

  • GSquared

    thanks, one more question what does the CTE represent?

  • jrw39 (10/20/2009)


    GSquared

    thanks, one more question what does the CTE represent?

    CTE is Common Table Expression. You can look it up in Books Online for further information, but generally it can be used as a type of inline view. Gus's query could be rewritten using a derived table and would work exactly the same.

    The trick with this query is using ROW_NUMBER() to generate the row numbers for each partition based upon the order you define. Then, once you have that - you can delete all rows that have a row number greater than 1.

    There are other uses for CTE's based on recursion - but that is another topic altogether.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply