Personally, to me the coolest thing about SSIS is the ability to generate packages programmatically.
I have spent a great deal of my career developing and/or implementing automated transformation and ETL applications.
In my experience there has always a gap between pointed specific business requirements and the capabilities of any specific tool or
application.
And once you got a set of transformations (DTS, SQL , OWB, whatever) with the tools you at your disposal, you then were faced with how could you reuse the logic with a slightly different file format.
SSIS does not have any transformation limitation from a practical perspective, no gaps. This is incredibly important.
From a purely data transformation perspective the core functionality of SSIS consist of the Control Flow Task and specifically the Data Transformation Task, consisting of the script component , stock and custom components and the ability to programmatically generate all of these via dynamic external metadata.
For the purpose of clarity let’s discuss the difference between internal metadata and external metadata in the context of SSIS.
From an SSIS package perspective there is only one type, internal or design time metadata. The package assumes the metadata; mainly columns, column mappings, connection, queries and paths are the same at run time compared to when they were created at design time. Through the use of various configuration options you can affect some of the parameters that are used for various connections; however you cannot changes columns, column mappings or derived columns values etc…
Beyond configuration it is common for a developer to want to design a package using specific components and be able to use it for a slightly different set of inputs, obviously “REUSE”. At this point the primary reuse “technique” in SSIS is the proverbial “copy and paste, then change a bunch of crap”.
The key to leveraging SSIS and enabling external metadata lies in the openness of the SMO object model.
Once you have designed and implemented an ETL process of required transformation many times you need to make the process reusable and dynamic. Let examine these concepts.
In the case of our marketing company exercise from Part 1, once we design, test and implement the logic into a transformation component, we have only partially leveraged SSIS. The advantage of using the component over the script is primary changing input columns and easier implementation. However the next issue would be implementing the same logic for users on demand and without requiring SSIS development. The key here is to embed the process (package and custom component) for our marketing company into an application and encapsulate the process with a user interface that does not require SSIS knowledge.
The implications of this are revealed in a simple example. We designed our marketing company package first as a script. The limitation here was that the entire logic and columns campaign id were hard coded. Then we converted the script into a custom transform, this allowed a SSIS developer the ability to easily reuse the component and change the source columns and campaign id. The finale step will be to create an application that will allow an end user the ability to source a file and apply the “Marketing Company “ process via the SSIS Package independently, without any programmed or developer intervention.
In this chapter we will introduce you to the SSIS programming model. You will learn how to integrate with dynamic metadata and how to utilize data cleansing functionality in your application. We will further extend the Custom Transform we created in Part 1 by programmatically generating the completed package for the “Marketing Company” solution. See Figure 1.
Figure 1
Integration Services SSIS Terminology
First let’s review SSIS Terminology a Package consists of Tasks, Precedence Constraints, Connection Managers and Containers.
A Data Flow Task is comprised of Components – Source Adapters, Transformations, Destination Adapters and Paths
It is important to be aware of the overall conceptual architecture as you proceed with this chapter. The above charts represent a high level overview of the SMO objects.
Embedding SSIS
SSIS is embeddable via the SQL Management Objects (SMO) which is a set of objects that exposes all of the management functionality of the SQL Server database. Management Studio was developed with SQL Management Objects. You can use SMO to automate common SQL Server administrative tasks, such as programmatically retrieving configuration settings, creating new databases, applying Transact-SQL scripts, creating SQL Server Agent jobs, and scheduling backups.
Package Creation Approaches
You can use SMO to create your own UI. Some examples from Microsoft are SSIS designer, Management Studio, Import/Export Wizard, and DTS Migration Wizard. You can enable Digital signing to make your application tamper resistance.
Most importantly SMO provides the opportunity for the creation of metadata driven package development.. Some Dynamic Metadata Scenarios are Source schema changes/not known until execution and Metadata driven ETL processes. You can generate data flows dynamically.
There are several approached you can take to Building Packages Programmatically. You can develop a package from scratch using object model. This is a simple and efficient approach albeit harder to grow the application. You can also base your design on a template package. Obviously then you will only provide code for adjusting required settings.
From a security perspective you also provide Digital signing which would detect any user changes
Some of the limitations we will discuss are that the Pipeline engine requires static metadata with the Buffers laid out during pre execute. This forces strict data types and prevents column mapping during execution.
Package Components Terminology
There are three key concepts to keep in mind when designing components that generate Packages Programmatically. They are Input Columns, Output Columns and Column Mappings. Input column refer to data referenced by the component, Output columns are External Metadata Columns. As discussed previously each is uniquely indentified by a LineageID and every output column gets a new LineageID. Central is the concept of Column Mapping. There are three types of column Mapping , they are Sources: ExternalColumn to OutputColumn, Transforms: InputColumn to OutputColumn and Destinations: InputColumn to ExternalColumn.
The Pipeline Programming Model
The Pipeline Programming Model is a very straightforward framework consisting of
Input -> Component -> Output / Runtime Connection Collection
TheIDTSComponentMetaData90object is provided for all components by the engine automatically. It manages metadata and persistence for the components input and contact information for unregistered components. In addition it helps delay creation of components until necessary.
The Runtime Connection Collection are the Connection managers used by the component
Application Overview
In “Part 1 Extending SSIS - Creating a Custom Data Flow Transformation Component” we introduced a business scenario for a marketing company. Refer to Part 1 for review. In this article we further extend the solution, by implementing it as a custom transform.
Specifically in this chapter we want to further extend the findvendparseaddr Transform component by dynamically building a package. The primary advantages of this approach are that we can wrap the component with a UI and allow the client to select the files to process and name their target files. .
The solution we will build will have the following capabilities:
- · Get data from an Excel file
- · Provide Address Parsing for City , Sate and Zip from the Occupational Description
- · Find matches for Vending companies for the Address text field
- · Save cleaned data in flat file
Figure 1
Establish your References
using System; using System.IO; using System.Collections.Generic; using System.Text; using Microsoft.SqlServer.Dts.Runtime; using Microsoft.SqlServer.Dts.Pipeline.Wrapper; using wrap = Microsoft.SqlServer.Dts.Runtime.Wrapper; using System.Xml; using System.Data.Common; using System.Configuration; using System.Collections; Create the Namespace namespace GenPackage {
Create Public Class GenPackage for declaring private variables using IDTSComponentMetaData90 as it contains the input and output objects, input and output columns, and the custom properties of the component. It also contains information that identifies DLL’s, like findvendparseaddr(Refer to Chapter 8). This is the assembly that we created previously.
public class GenPackage { private Package FindVendParseAddrPkg; private MainPipe DataFlow; private IDTSComponentMetaData90 ExcelSource; private IDTSComponentMetaData90 findvendparseaddr; private IDTSComponentMetaData90 ConditionOper; private IDTSComponentMetaData90 OLEDBDestination; private IDTSComponentMetaData90 OLEDBDestination1;
Add constant string variables for declaring Package Filename and two text output files .
const string PAKGFILENAME = @"\findvendparseaddr.dtsx"; const string OUTPUTFILEGOODNAME = @"\findvendoutputgood.txt"; const string OUTPUTFILEBADNAME = @"\findvendoutputbad.txt";
Delete package files from prior execution after declaring string variable for getting current directory path.
public GenPackage(List<Hashtable> _findInputColumn) { string currDir = Directory.GetCurrentDirectory(); File.Delete(currDir + OUTPUTFILEBADNAME); File.Delete(currDir + OUTPUTFILEGOODNAME); File.Delete(currDir + PAKGFILENAME);
Set the variables that will hold PackageName, PackageType, Description , CreatorName and ComputerName.
this.FindVendParseAddrPkg = ChapterPackage("findvendparseaddrPkg", " findvendparseaddr sample"); this.FindVendParseAddrPkg.Variables.Add("BusName ", false, "", "BAD_DATA");
Add Excel Connection Manager RunTime object and DataFlow Runtime Task.
this.AddExcelConnMng(currDir); this.DataFlowTask();
Add the DataFlow Runtime Components and the Custom component findvendparseaddr (Refer Part 1)
this.AddExcelSource(); this.Addfindvendparseaddr(); this.AddConditionSort(); this.AddOLEDBDestination(); this.AddOLEDBDestination1();
Save the Package.
Application a = new Application(); a.SaveToXml(currDir + PAKGFILENAME, this.FindVendParseAddrPkg,null); } private static Package ChapterPackage(string Name, string Description) { Package pkg = new Package(); pkg.PackageType = DTSPackageType.DTSDesigner90; pkg.Name = Name; pkg.Description = Description; pkg.CreatorName = System.Environment.UserName; pkg.CreatorComputerName = System.Environment.MachineName; return pkg; }
Add the stock Excel Connection Manager
private void AddExcelConnMng(string currDir) { ConnectionManager findvendparseaddr = this.FindVendParseAddrPkg.Connections.Add("EXCEL"); Set Stock Excel Connection Manager Properties. findvendparseaddr.Name = "Excel Source"; findvendparseaddr.ConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\olmonthly.xls;Extended Properties=""EXCEL 8.0;HDR=YES"";";
Add a stock OLEDB Destination Connection Manager.
ConnectionManager destConnection = this.FindVendParseAddrPkg.Connections.Add("OLEDB"); Set the stock OLEDB Destination Connection Manager properties such as Connection String, Format, ComponentClassID, Name and finally Task name and Task Description. destConnection.Name = "OLEDBConnection"; destConnection.ConnectionString = @"Provider=SQLNCLI;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=Chapter2;Data Source=(local);Auto Translate=False;"; this.OLEDBDestination = this.DataFlow.ComponentMetaDataCollection.New(); // Set stock properties. this.OLEDBDestination.ComponentClassID = "DTSAdapter.OLEDBDestination"; this.OLEDBDestination.Name = "OLEDBDestination"; this.OLEDBDestination.Description = "Destination data in the DataFlow"; CManagedComponentWrapper instance = this.OLEDBDestination.Instantiate(); instance.ProvideComponentProperties();
Attach path between the OLEDB Destination components Input , and the Conditional Split Component's Output.
this.DataFlow.PathCollection.New().AttachPathAndPropagateNotifications(this.ConditionOper.OutputCollection[1],this.OLEDBDestination.InputCollection[0]); Associate the runtime connection manager and the connection manager association will fail if it called before ProvideComponentProperties. this.OLEDBDestination.RuntimeConnectionCollection[0].ConnectionManagerID = this.FindVendParseAddrPkg.Connections["OLEDBConnection"].ID; this.OLEDBDestination.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.ToConnectionManager90(this.FindVendParseAddrPkg.Connections["OLEDBConnection"]);
Set custom component properties
instance.SetComponentProperty("OpenRowset", "[Victims]"); instance.SetComponentProperty("AccessMode", 0); Acquire Connections and reinitialize the component. instance.AcquireConnections(null); instance.ReinitializeMetaData(); instance.ReleaseConnections(); } private void AddOLEDBDestination1() { this.OLEDBDestination1 = this.DataFlow.ComponentMetaDataCollection.New(); // Set stock properties. this.OLEDBDestination1.ComponentClassID = "DTSAdapter.OLEDBDestination"; this.OLEDBDestination1.Name = "OLEDBDestination1"; this.OLEDBDestination1.Description = "Destination data in the DataFlow"; CManagedComponentWrapper instance = this.OLEDBDestination1.Instantiate(); instance.ProvideComponentProperties(); // Attach path between the OLEDB Destination components Input, and the Conditional Split Component's Output. this.DataFlow.PathCollection.New().AttachPathAndPropagateNotifications( this.ConditionOper.OutputCollection[0], this.OLEDBDestination1.InputCollection[0]); // Associate the runtime connection manager // The connection manager association will fail if called before ProvideComponentProperties this.OLEDBDestination1.RuntimeConnectionCollection[0].ConnectionManagerID = this.FindVendParseAddrPkg.Connections["OLEDBConnection"].ID; this.OLEDBDestination1.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.ToConnectionManager90(this.FindVendParseAddrPkg.Connections["OLEDBConnection"]); this.OLEDBDestination1.Name = "OLEDB Destination 1"; // set custom component properties instance.SetComponentProperty("OpenRowset", "[BadData]"); instance.SetComponentProperty("AccessMode", 0); // Acquire Connections and reinitialize the component instance.AcquireConnections(null); instance.ReinitializeMetaData(); instance.ReleaseConnections(); }
Adds a DataFlow task to the Executables collection of the package.
private void DataFlowTask() { TaskHost thost = this.FindVendParseAddrPkg.Executables.Add("DTS.Pipeline") as TaskHost; thost.Name = "DataFlow Chapter14"; thost.Description = "DataFlow task for FindVendparseaddr sample."; this.DataFlow = thost.InnerObject as MainPipe; //this.dataFlow.Events = this.pipelineEvents as wrap.IDTSComponentEvents90; }
Set Stock properties for Excel Source.
private void AddExcelSource() { this.ExcelSource = this.DataFlow.ComponentMetaDataCollection.New(); // Set stock properties. this.ExcelSource.ComponentClassID = "DTSAdapter.EXCELSource"; this.ExcelSource.Name = "EXCEL Source"; this.ExcelSource.Description = "Source data in the DataFlow"; IDTSDesigntimeComponent90 instance = this.ExcelSource.Instantiate(); instance.ProvideComponentProperties();
Associate the runtime connection manager
this.ExcelSource.RuntimeConnectionCollection[0].ConnectionManagerID = this.FindVendParseAddrPkg.Connections["Excel Source"].ID; this.ExcelSource.RuntimeConnectionCollection[0].ConnectionManager = DtsConvert.ToConnectionManager90(this.FindVendParseAddrPkg.Connections["Excel Source"]);
Set custom component properties.
instance.SetComponentProperty("OpenRowset", "Sheet1$"); instance.SetComponentProperty("AccessMode", 0); // Acquire Connections and reinitialize the component instance.AcquireConnections(null); instance.ReinitializeMetaData(); instance.ReleaseConnections(); } private void Addfindvendparseaddr() {
Add the component to the DataFlow task.
this.findvendparseaddr = this.dataFlow.ComponentMetaDataCollection.New();
Set custom component's properties like ComponentClassID containing version, Culture, Public Token etc from assembly.
this.findvendparseaddr.ComponentClassID = "findvendparseaddr.SampleProject, findvendparseaddr, Version=1.0.0.0, Culture=neutral, PublicKeyToken=145b75aed1f5efb1"; this.findvendparseaddr.Name = "findvendparseaddr"; this.findvendparseaddr.Description = "findvendparseaddr Component"; IDTSDesigntimeComponent90 instance = this.findvendparseaddr.Instantiate(); instance.ProvideComponentProperties();
Attach path between the ExcelSource components Output, and the findvendparseaddr Component's Input.
this.DataFlow.PathCollection.New().AttachPathAndPropagateNotifications( this.ExcelSource.OutputCollection[0], this.findvendparseaddr.InputCollection[0]);
Setting five default input column’s in findvendparseaddr custom component. (Refer Chapter 8 for Custom Component)
IDTSVirtualInput90 vInput = this.findvendparseaddr.InputCollection[0].GetVirtualInput(); IDTSInputColumn90 col; foreach (IDTSVirtualInputColumn90 vColumn in vInput.VirtualInputColumnCollection) { if (vColumn.Name == "BUS_NAME") { col = instance.SetUsageType(this.findvendparseaddr.InputCollection[0].ID, vInput, vColumn.LineageID, DTSUsageType.UT_READWRITE); } else if (vColumn.Name == "OCC_DESC_01") { col = instance.SetUsageType(this.findvendparseaddr.InputCollection[0].ID, vInput, vColumn.LineageID, DTSUsageType.UT_READWRITE); } else if (vColumn.Name == "OWNER_NAME") { col = instance.SetUsageType(this.findvendparseaddr.InputCollection[0].ID, vInput, vColumn.LineageID, DTSUsageType.UT_READWRITE); } else if (vColumn.Name == "SITUS_ADDRESS1") { col = instance.SetUsageType(this.findvendparseaddr.InputCollection[0].ID, vInput, vColumn.LineageID, DTSUsageType.UT_READWRITE); } else if (vColumn.Name == "SITUS_ADDRESS2") { col = instance.SetUsageType(this.findvendparseaddr.InputCollection[0].ID, vInput, vColumn.LineageID, DTSUsageType.UT_READWRITE); } } } } }
Calling AddConditionSort() function for adding Conditional Split component in Package. Add the Component to the DataFlow Task.
private void AddConditionSort() { this.ConditionOper = this.DataFlow.ComponentMetaDataCollection.New();
Set component's stock properties.
this.ConditionOper.ComponentClassID = "DTSTransform.ConditionalSplit"; this.ConditionOper.Name = "Conditional Split"; this.ConditionOper.Description = "Conditional Split component"; CManagedComponentWrapper instance = this.ConditionOper.Instantiate(); instance.ProvideComponentProperties();
Attach path between the findvendparseaddr components Output, and the ConditionOper Component's Input Using AttachPathAndPropagateNotifications() function.
this.DataFlow.PathCollection.New().AttachPathAndPropagateNotifications( this.findvendparseaddr.OutputCollection[0], this.ConditionOper.InputCollection[0]);
Setting Conditional Split OutPut Variables Name.
IDTSOutput90 OutPutBad = instance.InsertOutput(DTSInsertPlacement.IP_AFTER, 1); OutPutBad.Name = "BadData"; IDTSOutput90 output = this.ConditionOper.OutputCollection[1]; output.Name = "GoodData"; }
This is the main Class, when we run our Package Program then it will generate our package in \bin\debug folder. Here is the program
using System; using System.Collections.Generic; using System.Text; using System.IO; using System.Xml; using System.Data.Common; using System.Configuration; using System.Collections; namespace findvendarrdaddPKG { class Program { [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance", "CA1804:RemoveUnusedLocals")] static void Main(string[] args) { List<Hashtable> _findInputColumnlist = new List<Hashtable>(); Hashtable _findInputColumn=new Hashtable(); _findInputColumn.Add("BUS_NAME","BUS_NAME"); _findInputColumn.Add("OCC_DESC_01","OCC_DESC_01"); _findInputColumn.Add("OWNER_NAME","OWNER_NAME"); _findInputColumn.Add("SITUS_ADDRESS1","SITUS_ADDRESS1"); _findInputColumn.Add("SITUS_ADDRESS2","SITUS_ADDRESS2"); _findInputColumnlist.Add(_findInputColumn); GenPackage sample = new GenPackage(_findInputColumnlist); } } }
Summary
Programming SSIS to dynamically create packages is straightforward. Several embedding options exist; you can use SMO to create your own UI. Most importantly you can create packages that are metadata driven. Some Dynamic Metadata Scenarios are Source schema changes/not known until execution and Metadata driven ETL processes. You can generate data flows dynamically.
There are several approaches you can take to Building Packages Programmatically. You can develop a package from scratch using object model or you can base your design on a template package. You can also provide Digital signing which would detect any user changes
As you can see from our example we have create the package completely from scratch, we could easily extend this solution further to be driven from a simple windows application or web based application. An example of this would be to provide the user with a selection of sources and then allow them to select them individually as well as the columns and campaign id. We could then generate the package via our application and the SMO. This would result in an application that was driven completely by external metadata. The benefit to this is less programmer involvement and the greater reuse of the business logic required for solving our marketing company business problem.