Is This The Best Way to Remove Duplicate Records ?

  • ifila,

    I just recently used the MS KB you pointed out at the beginning of the your post. I would say it was a very straight forward process by just putting that logic in a delete statement and running it.

    Of course, I checked it with a select statement first. I'd definitely recommend it but it seems there are other more complicated ways. I'd say keep it simple so others have an easier time reading your code as well.

    However, I'm unsure of how efficient or inefficient this is, etc.

    Thanks,

    S

    --
    :hehe:

  • Perhaps if you posted all the data so we can run the exact code we might find an anomaly or two in your data.

  • I found the issue. It was NOT the delete SQL code !

    Thanks

  • ifila (7/16/2009)


    I found the issue. It was NOT the delete SQL code !

    Thanks

    Two way street here... what was the problem you found and how did you fix it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I made some assumptions.

    As each record is written to the DB, the primary key increases by one.

    I assumed that the emails with the highest key, among the duplicate fields would be the most recent, which was correct.

    What i did not plan for was an older (more out of date) resume being gathered from a location that was loaded later in the process.

    eg. A candidate has a resume on Monster and Careerbuilder.

    A recruiter works on a Livelink position today, checks Careerbuilder candidates and fills the positon.

    One week from now, the recruiter chooses to start a new Livelink search on Monster instead of Careerbuilder (random choice), the same candidate resume is on Monster but is an OLDER version, which the candidate forgot to update.

    Since this resume was added to the DB one week later, the key would be higher, thus the duplication process would grab this resume, instead of the newer one from Careerbuilder.

    I know, poor process planning, but i have not been a tech guy for 20 years :w00t:

  • Heh... the devil's in the data. Thanks for the feedback. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply