For a person who has suffered from tedium of making changes to several SSIS packages for an apparently small change in logic, the introduction of package parts (or reusable control flow templates) in Integration Services 2016 is a breath of fresh air. In pre-SQL Server 2016 environments, though to a limited extent, reusability could be achieved through use of SSIS templates (skeletal SSIS packages with control parts that could be reused to generate packages) or by transferring some of the SSIS functions to stored procedures, which could be called in Execute SQL Tasks, thereby centralizing the tasks to a database. While these features were crucial, these features did not obviate the need to make changes to each and every package if certain structural changes were required in the package flows across multiple SSIS packages.
In such cases, you would have had to spend multiple iterations making changes across all the packages that had tasks which were common in terms of their functionality for example disabling/enabling indexes, switching partitions etc. Then there was Biml (Business Intelligence Markup Language), a dialect of XML, which used in conjunction with VB.Net/C#, provided a medium to automate the generation of SSIS packages and its components through the use of code snippets which could be reused across different projects. However use of BIML necessitated the need to be fairly competent in these languages in order to extract full benefit of this facility. With the introduction of package parts, SSIS has though not taken a giant leap, but a small but important step in the direction of enhancing the reusability of its components.
Just to illustrate this, I have created a common ETL load with package parts (shown in Fig 1). Section on the right shows the package parts (files with .dtsxp extension) that are created to perform certain admin tasks in a package while section on the left shows the sequence in which the package parts are being used in a package. You can see in the right section that there is a .designer file that is created for packages containing control flow parts that marries the package code with package part code.
In the above example, Customer.dtsx package has been created with package parts and control flow tasks to perform following sequential tasks
- Get batch number (CF1_GetBatchNo) – Package part that accepts clientID and gets the latest batch number from a batch table
- Log Process Start (CF3_LogProcessStart) – Package part that accepts batch number and table name and makes an initial entry in a log table to suggest start of the process
- Truncate Table (CF4_TruncateTable) – Package part that accepts table name and performs truncate operation
- Load Table (DFT_LoadCUstomer) – Control flow task that loads customer table
- Log Process End (CF2_LogProcessEnd) - Package part that accepts batch number and table name and makes an final entry in a log table to suggest end of the process
Thus package parts that I have created are for administrative tasks that are fairly general and should be re-useable in SSIS packages that require similar flow. Once created these package parts can be used from the SSIS Toolbox window, by dragging the package parts from SSIS Toolbox section and dropping it into the control flow section for customer package.
Looking at the example above, you would think SSIS has taken a giant step in making its components modular and reusable. However there are few gotchas which need to be mentioned before you get your hopes high. In the next section, we explore the good and not-so-good features of package parts.
The Good
First, fairly common tasks can be made reusable. Once created, package parts can be reused via SSIS Toolbox pane. This enables centralizing of administrative tasks which obviates the need to make changes to every package when there is a simple change in logic in a common admin task.
Complex logic can be embedded in package parts. It is usually a best practice to break complex tasks into simple chunks and then embed in different package parts. This enables better modularity and reusability of codes. However, in cases when a task is specialized and is composed of several sequential sub-tasks, all the required control flow tasks can be clubbed in a single package part and then re-used. For example archiving and emailing a file can be achieved via a single package part which contains one File System Task for archiving and renaming the file and one Send Email Task for emailing it to the necessary recipients.
Package Parts can be incorporated in a package multiple times. Similar to control flow tasks, a package can contain multiple instances of package parts. As seen in fig 3, CF4_TruncateTable has been used twice in the package as CF4_TruncateTable1 and CF4_TruncateTable2.
Package Parts can take in inputs via variables. Package part variables are exposed in the parent package and can be configured to accept values from the package. For instance truncate table package part (CF4_TruncateTable) can accept the name of the table from the package via package part variable and perform the truncate operation (as fig 4). Here I am passing the table and schema name to the corresponding package part variables for the CF4_TruncateTable package part.
The Not So Good
Package part variables cannot be assigned value from the package variable. Well, this is not entirely true as it can be during design time. However this assignment does not persist after the package has been closed and reopened. This means that package part variables can only accept hard-coded values
Package part variables cannot output a value. Unlike certain control flow tasks which can intake values and also output values, package parts are meant to only accept values. No values can be returned from the package part that needs to be propagated to the subsequent package part or control flow task. For example in fig 1, CF1_GetBatchNo cannot output the batch number so that it could be used in the CF3_LogProcessStart. Hence CF1_GetBatchNo should be replaced with an Execute SQL task to perform this activity. Moreover this also makes CF3_LogProcessStart and CF2_LogProcessEnd redundant as these components cannot accept the batch number during execution time due to the limitation highlighted in the previous point. Both these components need to be replaced with Execute SQL tasks.
Package parts cannot be executed separately. Package parts cannot be executed independent of a package similar to control flow tasks which can only be executed once they have been used in a package. This is a limitation for package parts containing complex logic containing multiple control flow tasks.
Package part cannot be used in another package part. A package part can only be used in a package and not another package part. This limits the modularity of package parts as complex tasks cannot be broken to elementary work units and then assembled together.
Every package part creates its own set of connection managers (fig 5). Package parts are not designed to accept project level connection managers. Hence every package part needs to have its own set of connection managers. This creates multiple connection managers for a package that contains multiple package parts. This would lead to maintenance overheads as multiple connection managers need to be updated based on the requirement of the parent package.
Summary
In summary, though package parts feature represents an important step in terms of introducing more modularity and reusability in SSIS, it has few shortcomings. If these were to be addressed in the SQL Server 2016 release or subsequent releases, this would add a lot of comfort to the lives to ETL developers who have to deal with the nightmare of making changes to loads of SSIS packages for an apparently common simple change.
References
- http://www.bimlscript.com/
- https://msdn.microsoft.com/en-us/library/mt620034.aspx
- http://prathy.com/2016/06/my-thoughts-on-ssis-2016-ssis-package-parts-2/
- https://www.intertech.com/Blog/ssis-tutorial-intro-ssis-2016-package-parts/
- http://toddmcdermid.blogspot.com.au/2016/10/control-flow-package-parts-in-ssis-2016.html