Creating an SSIS Package is easy, with a lot of references available to help you. However, when it comes to reading, or let me say documenting, the contents of the package, it is a nightmare. This is especially true when you have many packages to review. I have to deal with a similar requirement and created a utility to read the package code. This article will help to use the script created and add any other enhancements as per requirement.
An SSIS package is made up of XML code structured from a specific XSD. We can read this code using SQL Server XQuery. Let me show an example here. We can fetch all the task name details used in the package by querying the executables path before the object name.
value('declare namespace p1="www.microsoft.com/SqlServer/Dts";//DTS:Executable/DTS:Executables/@p1:ObjectName[1]', 'nvarchar(max)')
To query the XML code of the package, we have to load the complete package XML into a table, which can be done using cmdshell command. First of all load all the package names with their path into a temporary table named "pkgStats".
SELECT @CommandLine = LEFT('dir "' + @Path + '" /A-D /B /S ', 8000); INSERT INTO @MyFiles (FullPath) EXECUTE xp_cmdshell @CommandLine; DECLARE @sql NVARCHAR(max); SET @sql = ' INSERT INTO pkgStats (PackagePath,PackageXML) select ''@FullPath'' as PackagePath ,cast(BulkColumn as XML) as PackageXML from openrowset(bulk ''@FullPath'',single_blob) as pkgColumn'; SELECT @sql = REPLACE(@sql, '@FullPath', @FullPath) EXEC sp_executesql @sql;
An SSIS package is having different tasks and elements categorized into major parts. The script will divide the different objects into five major parts:
- Connection objects
- Variables
- Parameters
- Control flow tasks
- Data flow tasks
The XSD representation of an SSIS package has different nodes for most of the tasks. The script will use XQuery to query all the nodes of different tasks which will fall in one of the major part and store in one temporary table named "PackageAnalysis". In addition to the detail of the tasks
The connections are fetched using the following query:
PRINT '---------------Get all connections-----------------------------------'; WITH CTE_PkgLevel AS ( SELECT pkg.RowID ,pkg.PackagePath ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ObjectName[1]', 'varchar(max)') AS ConnectionManagerName ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:CreationName[1]', 'varchar(max)') AS ConnectionManagerType --, cfnodes1.y.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ConnectionString[1]', 'varchar(max)') ,cfnodes.x.query('.').query('declare namespace p1="www.microsoft.com/SqlServer/Dts";./p1:ConnectionManager/p1:ObjectData/p1:ConnectionManager') AS ConnectionStringQry ,cfnodes.x.query('.').query('declare namespace p1="www.microsoft.com/SqlServer/Dts";./p1:ConnectionManager/p1:PropertyExpression') AS ExpressionQry ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:DTSID[1]', 'varchar(100)') AS ConnectionManagerID ,pkg.PackageCreatorName FROM pkgStats pkg CROSS APPLY pkg.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";(//DTS:Executable/DTS:ConnectionManagers/*)') AS cfnodes(x) ) ,CTE_PkgConLevel AS ( SELECT pkg.RowID ,pkg.PackagePath ,pkg.ConnectionManagerName ,pkg.ConnectionManagerType ,ConnectionManagerID --,pkg.ConnectionStringQry ,( SELECT TOP 1 cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";@p1:ConnectionString[1]', 'nvarchar(max)') AS ConnectionString FROM pkg.ConnectionStringQry.nodes('./*') AS cfnodes(x) ) AS ConnectionString ,( SELECT TOP 1 cfnodes.x.value('.', 'nvarchar(max)') AS ExpressionValue FROM pkg.ExpressionQry.nodes('./*') AS cfnodes(x) ) AS ExpressionValue ,PackageCreatorName ,( SELECT TOP 1 cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";@p1:Retain[1]', 'nvarchar(max)') AS RetainSameConnectionProperty FROM pkg.ConnectionStringQry.nodes('./*') AS cfnodes(x) ) AS RetainSameConnectionProperty FROM CTE_PkgLevel pkg ) INSERT INTO PackageAnalysis ( RowID ,PackagePath ,PackageName ,Category ,PackageCreatorName ,ObjectName ,ObjectType ,DelayValidationPropertyValue ,ObjectValue ,ExpressionValue ,ConnectionManagerID ,RetainSameConnectionProperty ) SELECT DISTINCT pkg.RowID ,pkg.PackagePath ,REPLACE(pkg.PackagePath, REPLACE(@Path, '*.dtsx', ''), '') PackageName ,'Connection' AS CategoryC ,pkg.PackageCreatorName ,cast(pkg.ConnectionManagerName AS VARCHAR(max)) AS ObjectName ,cast(pkg.ConnectionManagerType AS VARCHAR(max)) AS ObjectType ,'NA' AS DelayValidationPropertyValue ,cast(pkg.ConnectionString AS VARCHAR(max)) AS ObjectValue ,cast(ExpressionValue AS VARCHAR(max)) AS ExpressionValue ,ConnectionManagerID ,RetainSameConnectionProperty FROM CTE_PkgConLevel pkg;
The variables are fetched using the following query:
PRINT '------------------------Get all variable details-------------------------------------'; WITH CTE_PkgLevel AS ( SELECT pkg.RowID ,pkg.PackagePath ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ObjectName[1]', 'varchar(max)') AS VariableName ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:DTSID[1]', 'varchar(max)') AS VariableHexValue ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:Expression[1]', 'varchar(max)') AS ExpressionValue ,cfnodes.x.query('.').query('declare namespace p1="www.microsoft.com/SqlServer/Dts";./p1:Variable/p1:VariableValue') AS VariableQry ,PackageCreatorName FROM pkgStats pkg CROSS APPLY pkg.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";(//DTS:Executable/DTS:Variables/*)') AS cfnodes(x) ) --select * from CTE_PkgLevel ,CTE_PkgVarLevel AS ( SELECT pkg.RowID ,pkg.PackagePath ,pkg.VariableName ,pkg.ExpressionValue ,VariableHexValue ,( SELECT TOP 1 cfnodes.x.value('.', 'nvarchar(max)') AS VariableValue FROM pkg.VariableQry.nodes('./*') AS cfnodes(x) ) AS VariableValue --,pkg.ConnectionStringQry ,( SELECT TOP 1 cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";@p1:DataType[1]', 'nvarchar(max)') AS DataType FROM pkg.VariableQry.nodes('./*') AS cfnodes(x) ) AS VariableDataType ,PackageCreatorName FROM CTE_PkgLevel pkg ) INSERT INTO TblVariableDetails ( RowID ,PackagePath ,PackageName ,Category ,PackageCreatorName ,ObjectName ,ObjectType ,ObjectValue ,ExpressionValue ,VariableHexValue ) SELECT DISTINCT Tblvar.RowID ,Tblvar.PackagePath ,REPLACE(Tblvar.PackagePath, REPLACE(@Path, '*.dtsx', ''), '') PackageName ,'Variable' AS Category ,Tblvar.PackageCreatorName ,cast(Tblvar.VariableName AS VARCHAR(max)) AS TaskName ,cast(Tblvar.VariableDataType AS VARCHAR(max)) AS TaskType ,isnull(cast(Tblvar.VariableValue AS VARCHAR(max)), '') AS VariableValue ,isnull(cast(tblvar.ExpressionValue AS VARCHAR(max)), '') ExpressionValue ,VariableHexValue FROM CTE_PkgVarLevel Tblvar;
The parameters are fetched using the following query:
PRINT '------------------------Get all parameter details-------------------------------------'; WITH CTE_PkgLevel AS ( SELECT pkg.RowID ,pkg.PackagePath ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ObjectName[1]', 'varchar(max)') AS ParameterName ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:DTSID[1]', 'varchar(max)') AS ParameterHexValue ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:Expression[1]', 'varchar(max)') AS ExpressionValue ,cfnodes.x.query('.').query('declare namespace p1="www.microsoft.com/SqlServer/Dts";./p1:PackageParameter/p1:Property') AS ParameterQry ,PackageCreatorName FROM pkgStats pkg CROSS APPLY pkg.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";(//DTS:Executable/DTS:PackageParameters/*)') AS cfnodes(x) ) --select * from CTE_PkgLevel ,CTE_PkgVarLevel AS ( SELECT pkg.RowID ,pkg.PackagePath ,pkg.ParameterName ,pkg.ExpressionValue ,ParameterHexValue ,( SELECT TOP 1 cfnodes.x.value('.', 'nvarchar(max)') AS ParameterValue FROM pkg.ParameterQry.nodes('./*') AS cfnodes(x) ) AS ParameterValue --,pkg.ConnectionStringQry ,( SELECT TOP 1 cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";@p1:DataType[1]', 'nvarchar(max)') AS DataType FROM pkg.ParameterQry.nodes('./*') AS cfnodes(x) ) AS ParameterDataType ,PackageCreatorName FROM CTE_PkgLevel pkg ) INSERT INTO tblParameterDetails( RowID ,PackagePath ,PackageName ,Category ,PackageCreatorName ,ObjectName ,ObjectType ,ObjectValue ,ExpressionValue ,ParameterHexValue ) SELECT DISTINCT Tblvar.RowID ,Tblvar.PackagePath ,REPLACE(Tblvar.PackagePath, REPLACE(@Path, '*.dtsx', ''), '') PackageName ,'Variable' AS Category ,Tblvar.PackageCreatorName ,cast(Tblvar.ParameterName AS VARCHAR(max)) AS TaskName ,cast(Tblvar.ParameterDataType AS VARCHAR(max)) AS TaskType ,isnull(cast(Tblvar.ParameterValue AS VARCHAR(max)), '') AS ParameterValue ,isnull(cast(tblvar.ExpressionValue AS VARCHAR(max)), '') ExpressionValue ,ParameterHexValue FROM CTE_PkgVarLevel Tblvar;
The following XQueries are used to fetch Control Flow and Data flow task details:
WITH CTE_CFLevel AS ( SELECT pkg.RowID ,cfnodes.x.query('.').query('declare namespace DTS="www.microsoft.com/SqlServer/Dts"; ./DTS:Executable[@DTS:ExecutableType=''SSIS.Pipeline.3'']/DTS:ObjectData/pipeline/components/component') DFTQuery ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:refId[1]', 'varchar(max)') AS TaskPath ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ObjectName[1]', 'nvarchar(max)') AS TaskName ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:Description', 'nvarchar(max)') AS TaskTypeDescription ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ExecutableType', 'nvarchar(max)') AS TaskType ,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:DelayValidation', 'nvarchar(max)') AS DelayValidationPropertyValue ,ISNULL(cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:Disabled', 'nvarchar(max)'), 'False') AS IsDisabled ,'EvalExpression = ' + cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:EvalExpression', 'nvarchar(max)') AS ForloopEvalExpression ,cfnodes.x.query('.').query('declare namespace DTS="www.microsoft.com/SqlServer/Dts"; ./DTS:Executable[@DTS:ExecutableType=''Microsoft.SqlServer.Dts.Tasks.ExecuteSQLTask.ExecuteSQLTask, Microsoft.SqlServer.SQLTask, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'']/DTS:ObjectData/*') AS SqlTaskQry ,cfnodes.x.query('.').query('declare namespace DTS="www.microsoft.com/SqlServer/Dts"; ./DTS:Executable[@DTS:ExecutableType=''SSIS.ExecutePackageTask.3'']/*') ExecPkgTaskQry ,cfnodes.x.query('.').query('declare namespace DTS="www.microsoft.com/SqlServer/Dts"; ./DTS:Executable[@DTS:ExecutableType=''Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptTask, Microsoft.SqlServer.ScriptTask, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'']/DTS:ObjectData/ScriptProject/*') ScriptTaskQry FROM pkgStats pkg CROSS APPLY pkg.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";//DTS:Executable/DTS:Executables/*') AS cfnodes(x) ) INSERT INTO TblControlFlowDetails ( RowID ,TaskPath ,TaskName ,TaskTypeDescription ,TaskType ,DelayValidationPropertyValue ,DFTQuery ,SqlTaskQry ,ExecPkgTaskQry ,ScriptTaskQry ,IsDisabled ,ExpressionValue ) SELECT RowID ,TaskPath ,TaskName ,TaskTypeDescription ,TaskType ,DelayValidationPropertyValue ,DFTQuery ,SqlTaskQry ,ExecPkgTaskQry ,ScriptTaskQry ,IsDisabled ,ForloopEvalExpression FROM CTE_CFLevel; PRINT '---------- Insert Script task details-----------------------'; INSERT INTO TblScriptTaskdetails ( RowID ,ControlFlowDetailsRowID ,Script ) SELECT RowID ,CF.ControlFlowDetailsRowID ,cfnodes1.x.value('./ProjectItem[@Name=''ScriptMain.cs''][1]', 'varchar(max)') Script FROM TblControlFlowDetails cf CROSS APPLY Cf.ScriptTaskQry.nodes('.') AS cfnodes1(x); PRINT '---------- Insert Execute package task details-----------------------'; INSERT INTO TblExecutePackageTaskDetails ( RowID ,ControlFlowDetailsRowID ,ExecutePackageExpression ,ExecutedPackageName ,ExecutePackageConnection ) SELECT RowID ,cf.ControlFlowDetailsRowID ,cfnodes1.x.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts"; ./DTS:PropertyExpression[1]', 'varchar(1000)') ExecutePackageExpression ,cfnodes1.x.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts"; (./DTS:ObjectData/ExecutePackageTask/PackageName)[1]', 'varchar(1000)') ExecutedPackageName ,cfnodes1.x.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts"; (./DTS:ObjectData/ExecutePackageTask/Connection)[1]', 'varchar(1000)') ExecutePackageConnection FROM TblControlFlowDetails cf CROSS APPLY Cf.ExecPkgTaskQry.nodes('.') AS cfnodes1(x); PRINT '---------- Insert DFT details-----------------------'; INSERT INTO TblDFTTaskDetails ( RowID ,DFTTasksPath ,DFTTaskName ,DFTTaskType ,DFTTaskType1 ,DFTRowSet ,ParameterBindingParameterName ,DFTSQLCommand ,DFTConnectionManager ,Variable ,IsSortedProperty ,InputQry ,OutputQry ,MultihashcolumnSortPosition ) SELECT RowID ,dftnodes.x.value('@refId[1]', 'varchar(max)') AS DFTTasksPath ,dftnodes.x.value('@name[1]', 'varchar(max)') AS DFTTaskName ,dftnodes.x.value('@description[1]', 'varchar(max)') AS DFTTaskType ,dftnodes.x.value('@contactInfo[1]', 'varchar(max)') AS DFTTaskType1 ,dftnodes.x.value('data(./properties/property[@name=''OpenRowset''])[1]', 'varchar(max)') DFTRowSet ,dftnodes.x.value('data(./properties/property[@name=''ParameterMapping''])[1]', 'varchar(max)') ParameterBindingParameterName ,CASE WHEN isnull(dftnodes.x.value('data(./properties/property[@name=''SqlCommand''])[1]', 'varchar(max)'), '') = '' THEN isnull(dftnodes.x.value('data(./properties/property[@name=''SqlCommandVariable''])[1]', 'varchar(max)'), '') ELSE isnull(dftnodes.x.value('data(./properties/property[@name=''SqlCommand''])[1]', 'varchar(max)'), '') END DFTSQLCommand ,dftnodes.x.value('data(./connections/connection/@connectionManagerID)[1]', 'varchar(max)') DFTConnectionManager ,dftnodes.x.value('data(./properties/property[@name=''VariableName''])[1]', 'varchar(max)') Variable ,dftnodes.x.value('data(./outputs/output/@isSorted)[1]', 'varchar(10)') IsSortedProperty ,lineage.x.query('.') InputQry ,outputvalue.x.query('.') OutputQry ,outputvalue.x.value('data(./properties/property[@name=''InputColumnLineageIDs''])[1]', 'varchar(max)') as MultihashcolumnSortPosition FROM TblControlFlowDetails pkglvl CROSS APPLY pkglvl.DFTQuery.nodes('./*') AS dftnodes(x) OUTER APPLY dftnodes.x.nodes('./inputs/input/inputColumns/*') AS lineage(x) OUTER APPLY dftnodes.x.nodes('./outputs/output/outputColumns/*') AS outputvalue(x); PRINT '---------- Insert DFT details-----------------------'; PRINT '---------- Insert DFT Source and destination details-----------------------'; INSERT INTO TblSrcDestDetails ( RowID ,DataFlowDetailsRowID ,SourceColumn ,DestinationColumn ,SortKeyPosition ) SELECT RowID ,DataFlowDetailsRowID ,lineage.x.value('./@lineageId[1]', 'varchar(max)') AS SourceColumn ,COALESCE(lineage.x.value('./@name[1]', 'varchar(max)'),lineage.x.value('./@externalMetadataColumnId[1]', 'varchar(max)'), lineage.x.value('./@refId[1]', 'varchar(max)')) DestinationColumn ,lineage.x.value('./@sortKeyPosition[1]', 'varchar(200)') SortKeyPosition FROM TblDFTTaskDetails pkglvl CROSS APPLY pkglvl.InputQry.nodes('./*') AS lineage(x); INSERT INTO TblSrcDestDetails ( RowID ,DataFlowDetailsRowID ,SourceColumn ,DestinationColumn ,SortKeyPosition ) SELECT RowID ,DataFlowDetailsRowID ,lineage.x.value('./@lineageId[1]', 'varchar(max)') SourceColumn ,COALESCE(lineage.x.value('./@name[1]', 'varchar(max)'),lineage.x.value('./@externalMetadataColumnId[1]', 'varchar(max)'), lineage.x.value('./@refId[1]', 'varchar(max)')) DestinationColumn ,lineage.x.value('./@sortKeyPosition[1]', 'varchar(200)') SortKeyPosition FROM TblDFTTaskDetails pkglvl CROSS APPLY pkglvl.OutputQry.nodes('./*') AS lineage(x); PRINT '---------- Insert DFT Source and destination details-----------------------';
The attached stored procedure script (see the Resources section) queries the complete package XML and generates an output report. The procedure accepts the input path for the package files and iterates over the files to document the package code. The code uses a connection to a SQL Server database, which creates temporary tables for the processing.
The output of the procedure returns the complete package details in the form of a grid report.
Report Column Headers | Description |
Row ID, Table Row ID | These are all columns used to sequence the report output. RowID is unique for a single package. TableRowID is an incremental sequence number for all the rows of a package. |
Package Path, Package Name | Package Name and Package Path on the drive. |
Category | Describes Package categories into four types - Connection, Variable, Control Flow Task, Data Flow Task. |
Task Path | Path of the Task within a package. |
Object Name, Object Type | Object Name is Name of the task or object as given in package. Object Type is type of task, transformation or variable data type. |
ObjectValue | This is the most critical column showing the value of the Object such as queries, connections. |
Connection Manager | Describes the connection details used inside any task or transformation. |
Source Column, Destination Column | Gives the source and destination column name used inside all the Data flow transformations. |
Derived Value | Gives the derived column transformation expressions used in Derived Column Transformation. |
Is Disabled | Shows True / False for all the Control flow tasks if the task is disabled. |
Parameter Binding Sequence, Parameter Binding Parameter Name, Result Set Parameter Name | Describes all the property values used in Execute SQL Task or OLEDB Source. |
Execute Package Expression, Execute Package Name | Shows the name or the execute package expression used in Execute Package Task. |
Script | Shows the script written in a Script Task. |
Lookup Joins | Details of Lookup Joins used in Lookup Transformation. |
Sample Output
I have created a sample SSIS package with a Data Flow Task with a Source and Destination.
Here is the Data Flow tab with a simple data transformation.
After executing the procedure with the package path, it returns the following output. There are a number of columns returned, so the output is broken into the following two images.
We can copy the output from SQL Server Management Studio into Excel and use a Pivot Report to document the details of the package. Here is the Sample package Pivot report.
Summary
This post and the sql procedure to document the SSIS package can be helpful to document multiple SSIS package at the same time. This can help to do Code Review and Impact analysis for any change in the SSIS package by verifying the following:
- Task names consistency
- Confirming the Variable expressions used in the package
- Delay Validation property check
- Disable objects
- Package Protection Level propery
- Source queries
- Connection Manager setup with exprerssions
- Package Creator names
- Validation of the precedence constraint to connect two tasks
We can enhance the procedure functionality based on further requirements. Please do let me know the feedback on the code documentor.