Eliminating duplicate rows

  • Hi all,

    I have a question which I am sure has a simple solution;

    I have a table which contains duplicate rows, each duplicate row is only duplicated twice, but not all rows are duplicated. There is only one column which is not duplicated in the duplicated rows and that is the first column which is an identity type primary key field. Strictly speaking since the primary key is an identity field we don't have a duplicate row, but the rest of the data in the row is duplicated in another row, and the users want to eliminate the duplication, seems it is causing a problem with sorting in their application.

    I was thinking of using select distinct but am not sure how to handle the primary key issue. As far as I know the primary is not a foregin key in another table. What I want to do is pull everything but the primary key field into a new table. Does this sound reasonable?

    Thanks in advance for your help

    Ken

  • A real quick and dirty way would be to use a group by on all but the primary key field (min max that if there are just 2 rows max) and filter having a count * > 1.

    Something like

    delete table

    where primary_key in (select min\max(primary_key) from table group by a,b,c having count(*) >1)

    Of course you could replace in with exists.

    Quick, easy 1 time use kind of thing.

  • Kwitzell,

    For those rows that are duplicated, which row would you like to keep... that is, do you want to keep the one with the largest ID or the smallest?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Thanks for your response, the user does not care, I would think the smaller would do. I was told the numbers in the primary key are not referneced in the application. So, really it does not matter.

    Ken

  • Thank you for your reply, will give it a look.

    Ken

  • Considering you're throwing all of the fields in - the performance will likely suck pretty badly on this one.

    L_voren's solution will get rid of one dupe per group, so you'd have to keep running that until you got 0 records back.

    A one-shot way is something more along the lines of (performance notwithstanding)

    delete t1

    from mytable t1

    where

    exists (

    select *

    from mytable t2

    where

    t1.a=t2.a

    and t1.b=t2.b

    and etc...

    and t1.primarykey>t2.primarykey

    )

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks Matt,

    Your soulution is kind of what I was thinking, I just planned on creating a new table and putting in everything sans dupes and primary key, figured to create a new primary key column and renumber the rows.

    Ken

  • I'm sorry I must have misunderstood something here. If there are a max of 2 rows per group why would the delete need to be run more than once?

  • I came across the attached document which may help.

    David

    Best Regards,

    ~David

  • David thank you for the article, we will have to give it a try,

    Ken

  • L_voren (1/4/2008)


    I'm sorry I must have misunderstood something here. If there are a max of 2 rows per group why would the delete need to be run more than once?

    If there's exactly just one duplicate, then you're right - only one run through would suffice. I'm just never fortunate enough to have only one dupe....:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yes we only have one dup per row. I guess we lucked out this time!

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply