October 15, 2007 at 7:15 pm
Heya,
I am trying to determine how to most efficiently keep my table free from duplicates.
On an ongoing bases (every 20 minetues...or so), duplicate record will be inserted to this table. The differentiator will be a datetime field saying when the record was last modified.
I want to delete the old/er record(s) and keep the latest
PK LastModified_DateStamp
1 10/15/2007 02:14:07T657
1 10/15/2007 02:37:05T837
Here, I want to delete the record on top and keep the record at the bottom.
I would appreciate ant suggestions and/or pointers...
Thanks,
~J
October 15, 2007 at 9:21 pm
What you're calling a "PK" obviously isn't where this table is concerned or you wouldn't be able to insert the duplicate rows.
However, to solve your problem, this will work... and, it's very fast... finds 950 thousand dupes in a million row table and deletes them in 35 seconds.
[font="Courier New"]DELETE t1
   FROM yourtable t1,
        yourtable  t2
  WHERE t1.PK = t2.PK 
    AND t1.LastModified_DateStamp < t2.LastModified_DateStamp[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
October 15, 2007 at 9:27 pm
As a matter of fact, I am fully capable of inserting new record, that actually are duplicates into this table.
I am not quite sure why you say I would not be able to...did I leave some obvious information out...?
October 15, 2007 at 11:18 pm
You called one of your columns "PK"... if it were trully a primary key, it would not allow the dupes.
Doesn't matter... the code I posted will find them as you asked...
--Jeff Moden
Change is inevitable... Change for the better is not.
October 16, 2007 at 5:37 am
I understand what you are saying.
And I apologize for not being more clear.
My primary key is a true primary key in another system. Instead of truncating the table and do a fresh pull (it is to time consuming and expensive), I am only pulling down the records that changed from last time the process ran. Hence, 'duplicates' will occur in my sql table.
Now, I am trying to figure out how to determine the 'oldest' record and to delete that record.
Could I used the datetime as mentioned above? Or do I have to write a procedure?
I am hesitant to use a procedure, b/c I think it will be to expensive timewise.
Any suggestions..?
October 16, 2007 at 9:03 am
Jay H (10/16/2007)
I understand what you are saying.And I apologize for not being more clear.
My primary key is a true primary key in another system. Instead of truncating the table and do a fresh pull (it is to time consuming and expensive), I am only pulling down the records that changed from last time the process ran. Hence, 'duplicates' will occur in my sql table.
Now, I am trying to figure out how to determine the 'oldest' record and to delete that record.
Could I used the datetime as mentioned above? Or do I have to write a procedure?
I am hesitant to use a procedure, b/c I think it will be to expensive timewise.
Any suggestions..?
Like I said, the code I posted works... and it's nasty fast...
--Jeff Moden
Change is inevitable... Change for the better is not.
October 16, 2007 at 10:16 am
Thanks Jeff,
I really appreciate your persistence and help.
October 16, 2007 at 10:23 pm
Thanks for the feedback, Jay... if you get a chance, let me know what your timings are. Would really appreciate that...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply