Selecting duplicate records

  • Can someone tell me how to do this

    I got a tabel with no primary key field and no identity field and there are duplicates in it.

    Here is my table

    TABLE1

    ID  COL1 COL2 COL3 COL4 COL5 COL6

    1   X    X    X     XX        XX

    1   X    X    X     XX   XX   XX

    2   X    X    X     XX    

    3   X    X    X     XX

    3   X    X    X          XX   XX

     

    I want to select unique record based on ID, col1, col2 and col3 and select record with more data in it.

    So my result should look like this

    ID  COL1 COL2 COL3 COL4 COL5 COL6

    1   X    X    X     XX   XX   XX

    2   X    X    X     XX    

    3   X    X    X          XX   XX

    Is there a way to do this??

    Thanks

  • How do you define "more data" for columns 4 thru 6 ?

    Least number of NULLs ? What if there's still a tie, what is your business requirement if you can't differentiate based on cols 4 thru 6 ?

     

  • Record should have least number of nulls for columns 4 thru 6.

    If you can't differentiate then select first one.

     

    Thank you

    ash

     

     

  • >>If you can't differentiate then select first one

    Same problem. SQL tables are unordered. There is no concept of "first", you need to define the rules that determine which of a duplicate pair is considered "first"

  • There is a field for date updated, so I can use it to select last updated record.

    Thanks

    ash

     

  • >And a proper table does not have IDENTITY either since it can never be a key

    How do you generate unique customer numbers for new customers if you don't use IDENTITY?

    (Edit) And why do you say "it can never be a key"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Joe... great explanation about why sequential numbering of records and the use of the IDENTITY property is so bad.  I really appreciate the in-depth explanation (especially the history behind it) and the examples.  I'm sold!  But that also creates a problem for me...

    I guess I've fallen into the same ignorant trap as a lot of people and I need to change my thinking a bit (a lot!).  You probably said it in so many words but, if you were designing a Customer table, what would you use as the Primary Key if no SSN were allowed?  Yeah, I know... "Kill the guy writing the spec."  I'd love to but that'll have to wait.  So, what should I use?  It doesn't seem that a composite primary key on the customer related name columns would be worth a hoot because of possible typos by data entry folks.  The phone number wouldn't be worth a hoot because those can change and can be shared by more than one individual.  Another strike against the name columns is that even they can change, especially for women.  What the heck should I use for the Primary Key?  And, yes, I'm assuming that I need some form of Primary Key or, at least, a unique index to prevent duplicate data from being entered.  If I'm wrong about that, too, then I really need a new hat and more of your help. 

    Help me out, Joe... I'm intrigued and I need to learn this...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks, Joe...

    >If you absolutely after a few weeks of research cannot find an industry standard, then you have to pick a method to create your own. 

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply