June 30, 2005 at 5:51 am
I'm updating a DB from a flat file. If item exists in the table then it needs to be updated. If it doesn't exist it needs to be inserted - what's a good way to handle that situation.
Mark
June 30, 2005 at 7:08 am
I've seen a lot of people advocate importing the file into a staging table and then use Stored Procedures to do the Update/Inset logic.
I've only had to worry about this type of thing once before, and that's what I did (I think there is even a script here that demonstrates the update/insert part). Most of the tables I had to worry about this sort of thing on were so big that it was faster to just delete everything and re-insert the rows. You'll probably have to decide where the tipping point is for you. I know it's a lot easier to develop the delete then insert all rows stuff.
June 30, 2005 at 8:08 am
Chris has hit the nail right on the head.
Either use a staging table and run your logic or empty and repopulate.
There is no real rule of thumb for either method really. Try both out and see what works best for you.
If you have next to no indexes, the repopulation will probably be faster. Just remember if you get a failure then you could end up with no data. Don't truncate that table unless you know the package had completed successfully.
July 1, 2005 at 7:55 am
I have used the delete then insert thing too. Instead of delete you can aslo truncate the table. If your table has indexes and you delete the table you must rebuild the indexes. Conversly if you have a lot of indexes and each insert will aslo rebuild your index tables, potentially taking extra time.
One other option, albiet not the fastest execution wise, is to use the data driven query task. I have started to use it and for me it works well as a lot of the tables I load automatically on a periodic basis are not that large.
Also, depending on what you do you may want to do a dbcc reindex of the table when you are done.
July 3, 2005 at 12:21 pm
Staging is trhe best strategy for this situation. You must be careful however if you are populating a fact table in a datawarehouse. Updating records in a huge fact table can be resource intensive.
Updating is done however on slowly changing dimensions.
Anyway, if you are looking for a "DTS" method of doing it without staging, then look into the Data Driven Query Task. It allows you to execute any of 4 queries based on data coming from the source and transformed into a relation table.
HABIB.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply