delete and determine dups by datetime field

  • Heya,

    I am trying to determine how to most efficiently keep my table free from duplicates.

    On an ongoing bases (every 20 minetues...or so), duplicate record will be inserted to this table. The differentiator will be a datetime field saying when the record was last modified.

    I want to delete the old/er record(s) and keep the latest

    PK LastModified_DateStamp

    1 10/15/2007 02:14:07T657

    1 10/15/2007 02:37:05T837

    Here, I want to delete the record on top and keep the record at the bottom.

    I would appreciate ant suggestions and/or pointers...

    Thanks,

    ~J

  • What you're calling a "PK" obviously isn't where this table is concerned or you wouldn't be able to insert the duplicate rows.

    However, to solve your problem, this will work... and, it's very fast... finds 950 thousand dupes in a million row table and deletes them in 35 seconds.

    [font="Courier New"]DELETE&nbspt1

    &nbsp&nbsp&nbspFROM&nbspyourtable&nbspt1,

    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspyourtable&nbsp&nbspt2

    &nbsp&nbspWHERE&nbspt1.PK&nbsp=&nbspt2.PK&nbsp

    &nbsp&nbsp&nbsp&nbspAND&nbspt1.LastModified_DateStamp&nbsp<&nbspt2.LastModified_DateStamp[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a matter of fact, I am fully capable of inserting new record, that actually are duplicates into this table.

    I am not quite sure why you say I would not be able to...did I leave some obvious information out...?

  • You called one of your columns "PK"... if it were trully a primary key, it would not allow the dupes.

    Doesn't matter... the code I posted will find them as you asked...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I understand what you are saying.

    And I apologize for not being more clear.

    My primary key is a true primary key in another system. Instead of truncating the table and do a fresh pull (it is to time consuming and expensive), I am only pulling down the records that changed from last time the process ran. Hence, 'duplicates' will occur in my sql table.

    Now, I am trying to figure out how to determine the 'oldest' record and to delete that record.

    Could I used the datetime as mentioned above? Or do I have to write a procedure?

    I am hesitant to use a procedure, b/c I think it will be to expensive timewise.

    Any suggestions..?

  • Jay H (10/16/2007)


    I understand what you are saying.

    And I apologize for not being more clear.

    My primary key is a true primary key in another system. Instead of truncating the table and do a fresh pull (it is to time consuming and expensive), I am only pulling down the records that changed from last time the process ran. Hence, 'duplicates' will occur in my sql table.

    Now, I am trying to figure out how to determine the 'oldest' record and to delete that record.

    Could I used the datetime as mentioned above? Or do I have to write a procedure?

    I am hesitant to use a procedure, b/c I think it will be to expensive timewise.

    Any suggestions..?

    Like I said, the code I posted works... and it's nasty fast...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff,

    I really appreciate your persistence and help.

  • Thanks for the feedback, Jay... if you get a chance, let me know what your timings are. Would really appreciate that...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply