August 6, 2015 at 7:40 am
Hi Guys, I am looking for some general advice, although if anyone has specific code I would love to see it.
We are about to embark upon a full scale BI data warehouse project with full ETL: Straight extract from source systems, hand-rolled data quality analysis and MDM (can't afford Enterprise :crying:) ODS and Kimball facts and dimensions with a full ETL framework so there will be a LOT of packages. Most of these will follow similar design patterns source->staging will all be full table or delta reads and staging->Kimball will be dimension lookups with Type I, II SCD
As we identify additional steps that wee need to add to these design patterns (e.g. add rowcounts, or save bad data to an errors table converting the dataflow columns to XML) I want to be able to take the existing packages, increment the version number and insert, amend or delete the tasks within it.
The meta process would be
look at a list of source tables then for each table
create a new package based on the current design template
see if there is a package that already exists for this table. If there is then match tasks by task name and compare application critical details (e.g. column types) and update them where necessary but keep non critical details (e.g. task GUID) the same
The reason for this is that I would want to store both the template designs and the generated packages in source code control and be able to compare different versions of the packages. BIML does a great job of creating initial packages, but if you change the BIML file and rebuild the packages then the new packages are version 1 with completely different GUIDs so there is very little for the source code diff compare tools to hold onto and you will see lots of differences even if the packages are functionally identical.
Has anyone ever used this approach and if so what sort of things did you have to do?
Aaron
August 6, 2015 at 7:47 am
I would consider source controlling the BIML scripts rather than the packages. Treat the packages as compiled code.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
August 6, 2015 at 8:49 am
Thanks Phil. first one to reply again!
The problem is that it is the generated dtsx that would get deployed and I need to be able to prove that what is on the production server matches what was tested in UAT. For example, if a source system gets a schema change then the SSIS dataflow will change even though the BIML script is the same (and because BIML builds the package from scratch, all the GUIDs will change but the version and build numbers will still be 1). This is another limitation of BIML: there is no way to associate a generated dtsx with a 'version' of the BIML script so you can't prove which packages were created with which script.
BIML is fantastic for prototyping but as soon as you move out of DEV and into TEST you have to make a choice BIML or Source Code Control.
Also if you manually edit a dtsx created by BIML, you will lose all the changes when the package is re-generated because BIML can't detect that the package has been manually modified (i.e. the currently deployed version is build# > 1)
I am looking at Pragmatic Works Package Builder which comes as part of the BI Express pack to see if it does what I want.
Conceptually what I want is dependency injection whereby I can say that a package is made up of modules but the contents of those modules can be updated independently and then any package that relies on those modules picks up it's changes when it is recompiled - but all within the confines of BIDS/SSDT.
August 7, 2015 at 1:58 am
aaron.reese (8/6/2015)
Thanks Phil. first one to reply again!The problem is that it is the generated dtsx that would get deployed and I need to be able to prove that what is on the production server matches what was tested in UAT. For example, if a source system gets a schema change then the SSIS dataflow will change even though the BIML script is the same (and because BIML builds the package from scratch, all the GUIDs will change but the version and build numbers will still be 1). This is another limitation of BIML: there is no way to associate a generated dtsx with a 'version' of the BIML script so you can't prove which packages were created with which script.
BIML is fantastic for prototyping but as soon as you move out of DEV and into TEST you have to make a choice BIML or Source Code Control.
Also if you manually edit a dtsx created by BIML, you will lose all the changes when the package is re-generated because BIML can't detect that the package has been manually modified (i.e. the currently deployed version is build# > 1)
I am looking at Pragmatic Works Package Builder which comes as part of the BI Express pack to see if it does what I want.
Conceptually what I want is dependency injection whereby I can say that a package is made up of modules but the contents of those modules can be updated independently and then any package that relies on those modules picks up it's changes when it is recompiled - but all within the confines of BIDS/SSDT.
I'm guessing that you're doing this within the confines of the package deployment model (and not project deployment). Would that be correct? If so, I understand your problem.
We use the project deployment model and everything in the 'prod' branch of source control for SSIS gets deployed by a tool (TeamCity) with no human input (other than clicking the Run button). So we know that what's in that branch is what is deployed.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply