Overview
Most of us have been using Data Transformation Services (DTS) in the previous versions of SQL Server in order to manipulate or move data. With the introduction of SQL Server 2005, Microsoft has introduced a completely re-written version of DTS with a lot more new features which is now known as Integration Services. This new ETL platform is nothing but a set of utilities, applications, designers, components, and services all wrapped up into one powerful software application suite.
The most attracting feature of SSIS is that the data movement and transformation is separate from the package control flow and management. There are two different engines that handle these tasks. Integration Services Data Flow engine takes care of the data movement and transformation whereas the Integration Services Run-time engine manages the package control flow.
The motivation behind this article is that the SSIS API is not well documented in MSDN though you get an overall idea of what is going on. This article assumes that the reader is aware of the fundamental components of an SSIS Package. To know more about what SSIS is all about, check the MSDN Books Online http://msdn2.microsoft.com/en-us/library/ms141026.aspx
DTS vs. SSIS Programming
There are two fundamental approaches one can have for SSIS programming. One is to use the SSIS Designer to create and run packages. The other approach is to use the API to create, configure and run packages from own applications.
Those who are familiar with the DTS programming can see that there is very little code from DTS remaining in Integration Services. Some of the similarities are listed here.
SSIS | |
Control flow and Data flow on the same design surface | Control flow and Data flow have been separated |
Datapump (supported only one source, transform, and destination) | Data Flow Task or Pipeline (can perform extremely complex and advanced data processing within one pipeline) |
Step | Taskhost (An expanded version of Step) which schedules the task execution |
Precedence Constraints | Still remains with added features |
Connections | Connection Managers |
Getting Started
Let us get started with some actual coding in VB.NET, my favorite!
The following table lists the assemblies that are frequently used when programming Integration Services using the .NET Framework (http://msdn2.microsoft.com/en-us/library/ms403344.aspx).
Assembly | Description |
Microsoft.SqlServer.ManagedDTS.dll | Contains the managed run-time engine. |
Microsoft.SqlServer.RuntimeWrapper.dll | Contains the primary interop assembly (PIA), or wrapper, for the native run-time engine. |
Microsoft.SqlServer.PipelineHost.dll | Contains the managed data flow engine. |
Microsoft.SqlServer.PipelineWrapper.dll | Contains the primary interop assembly (PIA), or wrapper, for the native data flow engine. |
Consider the example of transferring data from a flat file to a database table in SQL Server 2005. Create the destination table in SQL Server prior to running the package. This can be done using a simple T-SQL statement.
The programmatic approach would include the following steps:
- Create an SSIS Package
http://msdn2.microsoft.com/en-us/library/ms135946.aspx
- Create Control flow, which would be as simple as adding a
new Data Flow task in our case.
http://msdn2.microsoft.com/en-us/library/ms135997.aspx
- Add two connection managers. We need a Flat file connection manager for the source and an OleDb or SQL Server connection manager for the destination.
Creating OleDb connection manager is not hard.
Dim oleDbConn As ConnectionManager = myPackage.Connections.Add("OLEDB") oleDbConn.Name = "MyOLEDBConnection" oleDbConn.ConnectionString = "Provider=SQLOLEDB.1; Integrated Security=SSPI;" & "Initial Catalog=<db>; Data Source=" & SS_INSTANCE & ";"
The following code shows creating a flat file connection manager and setting up the formatting information for the same.
cnFile = myPackage.Connections.Add("FLATFILE") cnFile.Properties("Name").SetValue(cnFile, "MyFlatFileConnection") cnFile.Properties("ConnectionString").SetValue(cnFile, <srcFilePath>) cnFile.Properties("Format").SetValue(cnFile, "Delimited") cnFile.Properties("Unicode").SetValue(cnFile,False) cnFile.Properties("ColumnNamesInFirstDataRow").SetValue(cnFile, True) cnFile.Properties("DataRowsToSkip").SetValue(cnFile,0) cnFile.Properties("RowDelimiter").SetValue(cnFile,vbCrLf) cnFile.Properties("TextQualifier").SetValue(cnFile,"""") cnFile.Properties("HeaderRowsToSkip").SetValue(cnFile, 0) cnFile.Properties("HeaderRowDelimiter").SetValue(cnFile,vbCrLf) cnFile.Properties("CodePage").SetValue(cnFile,1252) Dim MyFlatFilecn As SSISRuntime.IDTSConnectionManagerFlatFile90 = Nothing MyFlatFilecn = TryCast(cnFile.InnerObject,SSISRuntime.IDTSConnectionManagerFlatFile90) DtsConvert.ToConnectionManager90(cnFile) Dim name As SSISRuntime.IDTSName90 Dim col As SSISRuntime.IDTSConnectionManagerFlatFileColumn90 cnFile.AcquireConnection(Nothing) ' Add columns to the FlatFileConnectionManager Dim headers As String() 'Get the header row from flat file. This can be stored in a database table as well. 'Assuming that "headers" Array contains the headers Dim i As Integer = 0 While i < headers.Length col = MyFlatFilecn.Columns.Add() If i = headers.Length - 1 Then col.ColumnDelimiter = vbCrLf Else col.ColumnDelimiter = "," End If col.ColumnType = "Delimited" name = TryCast(col, SSISRuntime.IDTSName90) name.Name = headers(i) 'get the converSion information from the destination table 'this should assign the appropriate datatype, precision, scale and length to col getConversionInfo(col, name.Name) 'implement this i = i + 1 End While cnFile.ReleaseConnection(Nothing)
- Create Data flow which does the actual job of copying the data from flat file to table. This would involve creating a Flat file source and an OleDb destination. This is fairly standard. We give it for the sake of completeness.
Note: One can add data transformation components here if required.
DFSource = DTP.ComponentMetaDataCollection.New() DFSource.ComponentClassID = "DTSAdapter.FlatFileSource" DFSource.Name = "FlatFileSource" ' Get the design time instance of the component. SourceInst = DFSource.Instantiate() ' Initialize the component. SourceInst.ProvideComponentProperties() ' Specify the connection manager. If DFSource.RuntimeConnectionCollection.Count > 0 Then DFSource.RuntimeConnectionCollection(0).ConnectionManagerID = myPackage.Connections("MyFlatFileConnection").ID DFSource.RuntimeConnectionCollection(0).ConnectionManager = DtsConvert.ToConnectionManager90(myPackage.Connections("MyFlatFileConnection")) End If ' Reinitialize the metadata. SourceInst.AcquireConnections(Nothing) SourceInst.ReinitializeMetaData() Dim exOutColumn As IDTSExternalMetadataColumn90 For Each outColumn As IDTSOutputColumn90 In DFSource.OutputCollection(0).OutputColumnCollection exOutColumn = DFSource.OutputCollection(0).ExternalMetadataColumnCollection(outColumn.Name) SourceInst.MapOutputColumn(DFSource.OutputCollection(0).ID, outColumn.ID, exOutColumn.ID, True) Next SourceInst.ReleaseConnections()
Similarly add a destination.
- Connect Data flow components
http://msdn2.microsoft.com/en-us/library/ms136086.aspx
- Map source and destination columns
Dim input As IDTSInput90 = DFDestination.InputCollection(0) Dim destinationInputID As Integer = CInt(input.ID) Dim vInput As IDTSVirtualInput90 = input.GetVirtualInput() For Each vColumn As IDTSVirtualInputColumn90 In vInput.VirtualInputColumnCollection ' This will create an input column on the component. DestInst.SetUsageType(destinationInputID, vInput, vColumn.LineageID, DTSUsageType.UT_READONLY) ' Get input column. Dim inputColumn As IDTSInputColumn90 = input.InputColumnCollection.GetInputColumnByLineageID(vColumn.LineageID) ' Getting the corresponding external column. ' Ex : We will use the column name as the basis for matching data flow columns to external columns. Dim externalColumn As IDTSExternalMetadataColumn90 = input.ExternalMetadataColumnCollection(vColumn.Name) ' Tell the component how to map. DestInst.MapInputColumn(destinationInputID, inputColumn.ID, externalColumn.ID) Next
- Validate and execute the package
Dim pkgStatus As DTSExecResult = myPackage.Validate(Nothing, Nothing, Nothing, Nothing) If pkgStatus = DTSExecResult.Success Then Dim pkgResult As DTSExecResult = myPackage.Execute() End If
Notes
1: For debugging purposes it is a good idea to save the SSIS
package created programmatically which can then be exported to the Designer.
This would allow us to check whether the properties of the connection managers
etc are set properly. See the link for different options.
http://msdn2.microsoft.com/en-us/library/ms403347.aspx
2: To get a list of data types that SSIS uses, check
http://msdn2.microsoft.com/en-us/library/ms141036.aspx
3: In case a subset of columns from the flat file needs to be transferred to the destination table, then one can identify such columns in Step c, and then delete them from Output Collection of the source in Step d before creating the external columns
Conclusion
SQL Server Integration Services, the new ETL platform in SQL Server 2005, is the successor to DTS, which was there in previous versions of SQL Server. SSIS with all its new features is a huge topic in itself and we tried to look at some of the programming aspects of it. We have looked at how we can programmatically create a package to transfer data from a Flat file source to a database table. Interested readers can look at the MSDN for more information on SSIS. A couple of books that I have come across are also worth reading
- Microsoft SQL Server 2005 Integration Services by Kirk Haselden
- Professional SQL Server 2005 Integration Services by Brian Knight et al
- Microsoft SQL Server 2005: A Developers Guide by Michael Otey et al.