March 10, 2010 at 9:22 pm
I am running SSIS on SQL Server 2005 (Enterprise) and using it to transfer data from one database to another, for just a few tables. For my requirements, I need to run the task frequently (once a minute) but typically only a few records (if any) have changed. Does SSIS have the intelligence to determine if modifications have occurred (on a per record basis), or does it blindly analyze all source / destination tables involved every time the scheduler is run? For my scenario (scheduler runs frequently, large number of number total records, but minimal number of changed records), would it make sense to:
1) Add a boolean to all source tables involved with the SSIS process
2) Set the boolean when a record is changed or added
3) Use the boolean in the select criteria within SSIS
4) Clear the boolean after the scheduler is complete?
Or, is SSIS internally doing something similar (or can I enable it to do something like this automatically for me)?
Thanks,
Kevin
March 10, 2010 at 9:37 pm
SSIS won't do this automatically for you. I don't think the boolean is a good idea because it requires a write back to the table and I can think of a case where while you are copying the data a new record gets tagged and gets missed. I would tend to say add a datetime field and update it whenever the row is updated. This gives you the ability to look at the destination and ask the question what is the highest update date that you have a record of, and then go back to the source and query everything higher than that value. This requires no write back and is easy to implement.
CEWII
March 10, 2010 at 10:32 pm
Does the SSIS commit each update to a table, or perform a single commit at the end of all updates?
Thanks,
Kevin
March 10, 2010 at 10:54 pm
I believe you can accomplish either.
CEWII
March 11, 2010 at 5:46 am
I think that if you use an OLE DB Command in your data flow, there will be a commit after each single update (as the OLE DB Command will issue an update against the DB for each row in the dataflow).
If you use a Execute SQL Task in the Control Flow, where you specify the whole update statement, you'll get a commit after all the updates.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 11, 2010 at 8:18 am
All true, however, I believe you can get them to do a batch commit. It is very important to keep in mind that different destinations in a data-flow, even to the same database and table will be in different transaction contexts.
CEWII
March 11, 2010 at 9:35 am
Elliot,
I like the solution you posted above (implement a timestamp), and query for destination tables for the last update timestamp prior to running the SELECT query. Two questions:
1) How do I run this initial SELECT query against my destination database / table and transfer its output to the SELECT on the source database / table?
2) If I put a ORDER BY on my SELECT query, will the same order be used for inserting / updating the destination database / table (that way if the process were to die in the middle of the insert / update, the next time the SSIS scheduler runs, it will pick up at the appropriate place and not skip any data).
Thanks!
Kevin
March 11, 2010 at 11:08 am
1. See that attached dtsx file in the zip..
2. Usually if it dies in process it will rollback. Depending on settings.
CEWII
March 11, 2010 at 2:43 pm
Thanks!
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply