November 4, 2008 at 1:13 pm
I have about 40 source files.
They are supposed to be loaded into staging tables
and later inserted into Data warehouse Dimension and Fact tables.
So far I see a few ways to do it.
I'm looking for a feedback on any of this approaches:
1. Design 40 packages with some reusable pieces (calls to generic ErrorHandling.dtsx,EmailNotofication.dtsx)
2. Consolidate packages and instead of having 40 packages create like 5-6 packages with multiple Data Flow tasks to load 4-5 source files into staging tables in each Data Flow task (DFT_Source1,DFT_Source2,etc.)
3. There is also a crazy "SSIS Componentization" solution from
http://www.simple-talk.com/sql/sql-tools/ssis-package-componentization/[/url]
But this one looks scary to execute packages in a loop...
I am still reading this article to decide if this is a suitable solution for me.
I am looking for a scalable solution and try not to put too much logic inside one package. Whenever the same process can be done for multiple source files I isolate it in a separate package. But it will still leaves me with the necessity to have 40 separate Data Flow transformations (consolidated or not) and that's what is bothering me.
November 5, 2008 at 4:59 am
If for the 40 files you have different columns and they are going into tables with different schemas, you will really need 40 data flows. You could use some code and dynamically generate data flows, but 40 really is not that big of a number and is probably better just managed carefully.
I would personally break it into 8-10 packages with 4-5 data flows in each.
BIDS will error out if you put too many components into a single package. It runs out of memory loading the package. It takes a lot of components (usually hundreds) but if your data flows are complicated enough it could be an issue. Breaking it into several packages does make it easier to have multiple people working on things. If you create a base package with all of your connection managers and copy it, you will be able to re-use package configuration files to keep things consistent.
Using a loop is nice for code re-use, but it would mean serial execution of 40 packages and your performance is going to suffer. If you have several packages, you can run parallel data flows within them and simply run an appropriate number of the packages at any one time. This would allow you to throttle the workload if necessary or run everything in parallel.
November 5, 2008 at 6:50 am
Thank you for your feedback Michael.
This is the direction I am moving in.
I created a base package (template) with all connections and even
Data Flow steps. I prefer to modify Derived Columns,Conversion and other tasks rather than creating new ones.
And I just pass a different SourceFile path to each package.
I also created generic packages to perform the same operation with different objects (truncate stage table before loading new data, email notifications).
"...You could use some code and dynamically generate data flows, ..."
Michael, could you give me some samples of what you mean here please?
November 5, 2008 at 7:14 am
SSIS is designed to allow dynamic package creation. There are samples on the MSDN website. It involves writing an application (in a .net programming language) that uses the SSIS object model to create the packages and execute them.
I think if you search for meta-data driven SSIS packages you will find some examples. It is actually not that difficult, but it does require knowing a .net programming language and a pretty good understanding of SSIS and it's object model.
August 6, 2009 at 10:33 am
CozyRoc has enhanced Data Flow Task Plus to support dynamic data flows. We will be happy to hear your feedback.
August 6, 2009 at 4:55 pm
riga1966 (11/4/2008)
I have about 40 source files.They are supposed to be loaded into staging tables
and later inserted into Data warehouse Dimension and Fact tables.
So far I see a few ways to do it.
I'm looking for a feedback on any of this approaches:
1. Design 40 packages with some reusable pieces (calls to generic ErrorHandling.dtsx,EmailNotofication.dtsx)
2. Consolidate packages and instead of having 40 packages create like 5-6 packages with multiple Data Flow tasks to load 4-5 source files into staging tables in each Data Flow task (DFT_Source1,DFT_Source2,etc.)
3. There is also a crazy "SSIS Componentization" solution from
http://www.simple-talk.com/sql/sql-tools/ssis-package-componentization/[/url]
But this one looks scary to execute packages in a loop...
I am still reading this article to decide if this is a suitable solution for me.
I am looking for a scalable solution and try not to put too much logic inside one package. Whenever the same process can be done for multiple source files I isolate it in a separate package. But it will still leaves me with the necessity to have 40 separate Data Flow transformations (consolidated or not) and that's what is bothering me.
Have you considered BCP?
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy