Introduction
If you have an SSIS project that's matured over time and makes good use of variables and parameters, you're probably nervous about deleting any of them, even if you think that they are no longer in use. Wouldn't it be great to be sure and then tidy things up with confidence?
I had one such project and decided to tackle the problem in code, after being unable to find anything already written which covers the same ground. BI Developer Extensions looked promising, but I found that it was buggy in this area. It suggested that some of my variables were unused, yet they were definitely being used.
This also gave me the chance to improve my almost non-existent T-SQL XPath skills, with some much-needed help from the community here at SQLServerCentral.
Sample Output
I created a simple SSIS project containing a single package and added a few parameters and variables. The variable, SourceText, is referenced by five other variables. The variable, Splitter3, is referenced in a Script Task, and the project parameter, StrangeProjectParam, is referenced in a Connection Manager. In SSDT, the declarations of these variables and parameters appear as follows:
When I run my T-SQL script against the folder containing the package, I see the following results:
The script returns a list of all of the variables and parameters in the project, together with their design-time values, and adds a calculated 'NumberOfReferences' column.
If NumberOfReferences is zero, no references to the parameter or variable have been found, and it should be safe to delete it. I say 'should' because, despite my testing, SSIS projects can be very complicated things and there may be cases which my code does not detect. If you find such a case, please let me know.
Can I Use It Now?
If you want to jump straight in and try it for yourself, here's what to do.
- Determine the user running your SQL Server service (by default, this may be NT Service/MSSQLSERVER)
- Make sure that this user has at least Read access to the place where your SSIS project files are stored (because the code analyses files in the file system, not in SSISDB)
- Grab a copy of the code (see the later section for the source code)
- Modify the row towards the beginning of the script which declares the @FolderPath parameter to be the path to the SSIS project of interest in your own environment. This should be the folder which contains the packages and Project.params files you wish to analyse.
- Execute in SSMS
Read on if you are interested in how it works. Otherwise, jump straight to the code section.
Solution Logic
This is a description of how the code works.
- Import the XML for all SSIS packages (*.DTSX), project parameters (Project.params) and connection managers (*.conmgr) into a temporary table.
- Create a temporary table to hold the results of performing the analysis.
- Query the table from (1) above to obtain a list of package variables and parameters, along with their design-time values and expressions and insert the results in the results table.
- Update the NumberOfReferences column in the results table, for the package variables and parameters.
- Query the table from (1) above to obtain a list of project parameters, along with their design-time values and expressions and insert the results in the results table.
- Update the NumberOfReferences column in the results table, for the project parameters, by scanning all of the imported packages and connection managers.
- Perform a corrective update on the NumberOfReferences column, to fix the counts in cases where there are multiple variables or parameters which start with the same characters (eg, 'path', 'paths', 'path1', 'path11').
- Return the results by selecting from the results table
Techniques Used
Here are a few of the techniques that the code uses.
Importing an XML File to a SQL Server Table
The (unsupported) xp_DirTree extended stored procedure is used to get a list of files to be imported and these are put into a temporary table. A cursor loops round the entries in the temporary table and imports the files' contents (all of which are XML) into another temporary table. using OPENROWSET().
Extracting Variable and Parameter Declaration Information from XML
For me, this was the most challenging part of the process – the extraction of tabular-format data from XML documents. The code relies on two things: a knowledge of XPath and an understanding of the structure of the XML in the document you are pulling data from.
For illustration, I created a very simple package containing a single variable, then edited its XML and stripped it down to include only those parts relevant to the variable itself. It looks like this:
<?xml version="1.0"?> <DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts" DTS:refId="Package"> <DTS:Property DTS:Name="PackageFormatVersion">8</DTS:Property> <DTS:Variables> <DTS:Variable DTS:Namespace="User" DTS:ObjectName="Var1"> <DTS:VariableValue DTS:DataType="3">99</DTS:VariableValue> </DTS:Variable> </DTS:Variables> <DTS:Executables /> </DTS:Executable>
This remains a valid XML document. Here is the code which extracts the variables' declaration information. It's the CROSS APPLY that provides the way to burrow into the XML and extract information from any path containing the Executable/Variables/Variable hierarchy. Looking at the XML above, it should be clear how the variable name, value and namespace are obtained. The variable in my sample package did not have an Expression associated with it.
SELECT PackageName = fl.FileName ,ItemType = 'PackageVariable' ,ItemNamespace = v.n1.value('(@*:Namespace)[1]', 'varchar(30)') ,ItemName = v.n1.value('(@*:ObjectName)[1]', 'varchar(30)') ,ItemValue = v.n1.value('(*:VariableValue)[1]', 'varchar(4000)') ,ItemExpression = v.n1.value('(@*:Expression)[1]', 'varchar(4000)') FROM #FileList2 fl CROSS APPLY fl.FileXML.nodes('//*:Executable/*:Variables/*:Variable') v(n1)
#FileList2 is a temporary table containing an XML column (containing the XML of packages) called FileXML.
Counting the Number of References to Variables and Parameters
Once you have a list of all of the variables and parameters, along with all of the file contents, as strings in a table, it's easy enough to count the number of occurrences. We do this by replacing the qualified variable/parameter name with an empty string and comparing the length of the string containing the replacements with the length of the original string. The original string is the file's XML, converted to NVARCHAR(MAX).
One complication is where there are multiple variables that start with the same text ('path' and 'path1', for example), which leads to over-counting. There is a separate section towards the end of the code which corrects any over-counting.
Code to Perform Variable and Parameter Analysis
This code was developed on SQL Server 2017 and tested against packages developed in Visual Studio 2017.
A Note on Collapsible Regions
I use an SSMS add-on product called SSMSBoost. One of my favourite features of this product is the ability to define collapsible code regions. I also used the 'hidden' dark theme. In SSMS, with the regions all collapsed, the code for the entire script looks like this:
This is why my code contains --#Region and --#Endregion comment lines – these are interpreted by SSMSBoost.
Source Code
USE tempdb; GO /* SSIS Parameter and Variable Analysis ------------------------------------ Author: Phil Parkin Date: 5 June 2020 Possible Future Refinements --------------------------- 1) Allow breakdown of project parameters to show in which package/connection they were found and number of occurrences in each Limitations and Notes --------------------- 1) If parameter values are passed from parent package to child, and those parameters are not subsequently used in the child, they will be detected as 'unused' within the child package 2) Tested only on VS2017 3) If there are multiple instances of a package variable with the same name, but multiple scopes, this routine will fail. I could probably code round this, if anyone can convince me that this practice makes any sense! (Violation of PRIMARY KEY constraint ... Cannot insert duplicate key in object 'dbo.#SSISResults') 4) The SQL Server service user (eg, NT SERVICE\MSSQLSERVER) must have read access to the folder containing the package and project parameter files 5) Set the @FolderPath to be path to the folder which contains the packages and project parameters 6) If you have added comments or descriptive text anywhere in your package which includes the qualified name of a variable or parameter (eg, User::VariableName), it will be counted as an occurrence 7) Manipulate the final WHERE clause as desired. */SET NOCOUNT ON; --*** Change the path which appears here to suit your environment *** DECLARE @FolderPath VARCHAR(250) = 'C:\Temp\SSISTest'; DECLARE @SQL VARCHAR(MAX); DECLARE @FileName NVARCHAR(260); DROP TABLE IF EXISTS #FileList; DROP TABLE IF EXISTS #FileList2; DROP TABLE IF EXISTS #FileXML; BEGIN TRY --__________________________________________________________________________________________________________________________________ --#region Import files of interest to temp tables #FileList/#FileList2 CREATE TABLE #FileList ( FileName NVARCHAR(260) NOT NULL PRIMARY KEY CLUSTERED ,depth INT NOT NULL ,[file] INT NOT NULL ); CREATE TABLE #FileList2 ( FileName NVARCHAR(260) NOT NULL PRIMARY KEY CLUSTERED ,Extension VARCHAR(10) ,FileXML XML ,FileNVarchar NVARCHAR(MAX) ); INSERT #FileList ( FileName ,depth ,[file] ) EXEC master.sys.xp_dirtree @FolderPath, 1, 1; INSERT #FileList2 ( FileName ) SELECT fl.FileName FROM #FileList fl; DROP TABLE IF EXISTS #FileList; UPDATE fl SET fl.Extension = (CASE WHEN fl.FileName LIKE '' THEN REVERSE(LEFT(rev.NameFile, CHARINDEX('.', rev.NameFile) - 1)) ELSE '' END ) FROM #FileList2 fl CROSS APPLY (SELECT NameFile = REVERSE(fl.FileName)) rev; DELETE #FileList2 WHERE Extension NOT IN ('dtsx', 'params', 'conmgr'); DECLARE @NumFiles INT = ( SELECT COUNT(*) as Computed FROM #FileList2 fl2 ); IF @NumFiles = 0 THROW 52000, 'No SSIS files found. Please verify the value of @FolderPath and that the SQL Server service account has access to that folder path.', 1; CREATE TABLE #FileXML ( FileXML XML NOT NULL ); DECLARE files CURSOR LOCAL FAST_FORWARD FOR SELECT fl.FileName FROM #FileList2 fl; OPEN files; FETCH NEXT FROM files INTO @FileName; WHILE @@FETCH_STATUS = 0 BEGIN DELETE #FileXML; SET @SQL = CONCAT( 'SELECT BulkColumn = CONVERT(XML, BulkColumn) FROM OPENROWSET(BULK ''' ,@FolderPath ,'/' ,@FileName ,''' ,SINGLE_BLOB ) x;' ); INSERT #FileXML ( FileXML ) EXEC (@SQL); UPDATE fl SET fl.FileXML = fx.FileXML ,fl.FileNVarchar = CAST(fx.FileXML AS NVARCHAR(MAX)) FROM #FileList2 fl CROSS JOIN #FileXML fx WHERE fl.FileName = @FileName; FETCH NEXT FROM files INTO @FileName; END; CLOSE files; DEALLOCATE files; --#endregion Import files of interest to temp tables #FileList/#FileList2 --__________________________________________________________________________________________________________________________________ --#region Create temp table to hold the results of the analysis DROP TABLE IF EXISTS #SSISResults; CREATE TABLE #SSISResults ( PackageName VARCHAR(100) NOT NULL ,ItemType VARCHAR(50) NOT NULL ,ItemNamespace VARCHAR(50) NOT NULL ,ItemName VARCHAR(100) NOT NULL ,ItemValue VARCHAR(4000) NULL ,ItemExpression VARCHAR(4000) NULL ,NumberOfReferences INT NULL PRIMARY KEY CLUSTERED ( PackageName ,ItemType ,ItemNamespace ,ItemName ) ); --#endregion Create temp table to hold the results of the analysis --__________________________________________________________________________________________________________________________________ --#region Extract the package variables and parameters first INSERT #SSISResults ( PackageName ,ItemType ,ItemNamespace ,ItemName ,ItemValue ,ItemExpression ) SELECT PackageName = fl.FileName ,ItemType = 'Package Parameter' ,ItemNamespace = '$Package' ,ItemName = p.n1.value('(@*:ObjectName)[1]', 'varchar(30)') ,ItemValue = p.n1.value('(*:Property)[1]', 'varchar(4000)') ,ItemExpression = NULL FROM #FileList2 fl CROSS APPLY fl.FileXML.nodes('//*:Executable/*:PackageParameters/*:PackageParameter') p(n1) WHERE fl.Extension = 'dtsx' UNION ALL SELECT PackageName = fl.FileName ,ItemType = 'PackageVariable' ,ItemNamespace = v.n1.value('(@*:Namespace)[1]', 'varchar(30)') ,ItemName = v.n1.value('(@*:ObjectName)[1]', 'varchar(30)') ,ItemValue = v.n1.value('(*:VariableValue)[1]', 'varchar(4000)') ,ItemExpression = v.n1.value('(@*:Expression)[1]', 'varchar(4000)') FROM #FileList2 fl CROSS APPLY fl.FileXML.nodes('//*:Executable/*:Variables/*:Variable') v(n1) WHERE fl.Extension = 'dtsx'; --#endregion Extract the package variables and parameters first --__________________________________________________________________________________________________________________________________ --#region Calculate the number of references to the package variables and parameters UPDATE res SET res.NumberOfReferences = (LEN(fl.FileNVarchar) - LEN(REPLACE(cast(cast(cast(fl.FileNVarchar as nvarchar(max)) as nvarchar(max)) as nvarchar(max)),cast(cast(cast( calc1.ReplaceString as nvarchar(max)) as nvarchar(max)) as nvarchar(max)),cast(cast(cast( '' as nvarchar(max as nvarchar(max as nvarchar(max))))))))) / LEN(calc1.ReplaceString) FROM #FileList2 fl JOIN #SSISResults res ON fl.FileName = res.PackageName CROSS APPLY ( SELECT ReplaceString = CONCAT(res.ItemNamespace, '::', res.ItemName) ) calc1; --#endregion Calculate the number of references to the package variables and parameters --__________________________________________________________________________________________________________________________________ --#region Now add the project parameters INSERT #SSISResults ( PackageName ,ItemType ,ItemNamespace ,ItemName ,ItemValue ) SELECT PackageName = 'Project' ,ItemType = 'Project Parameter' ,ItemNamespace = '$Project' ,ItemName = p.n1.value('(@*:Name)[1]', 'varchar(30)') ,ItemValue = q.n1.value('(.)[1]', 'varchar(4000)') FROM #FileList2 fl CROSS APPLY fl.FileXML.nodes('//*:Parameters/*:Parameter') p(n1) CROSS APPLY p.n1.nodes('*:Properties/*:Property[@*:Name="Value"]') q(n1) WHERE fl.Extension = 'params'; --#endregion Now add the project parameters --__________________________________________________________________________________________________________________________________ --#region Calculate the total number of references to the project parameters, across all packages WITH ParamTotals AS (SELECT sr.PackageName ,sr.ItemType ,sr.ItemNamespace ,sr.ItemName ,TotalNumberOfReferences = SUM(NumRefs.NumberOfReferences) FROM #SSISResults sr CROSS APPLY ( SELECT ReplaceString = CONCAT(sr.ItemNamespace, '::', sr.ItemName) ) calc1 CROSS APPLY ( SELECT NumberOfReferences = (LEN(fl.FileNVarchar) - LEN(REPLACE(cast(cast(cast(fl.FileNVarchar as nvarchar(max)) as nvarchar(max)) as nvarchar(max)),cast(cast(cast( calc1.ReplaceString as nvarchar(max)) as nvarchar(max)) as nvarchar(max)),cast(cast(cast( '' as nvarchar(max as nvarchar(max as nvarchar(max))))))))) / LEN(calc1.ReplaceString) FROM #FileList2 fl WHERE fl.Extension = 'dtsx' OR fl.Extension = 'conmgr' ) NumRefs WHERE sr.ItemType = 'Project Parameter' GROUP BY sr.PackageName ,sr.ItemType ,sr.ItemNamespace ,sr.ItemName) UPDATE sr SET sr.NumberOfReferences = t.TotalNumberOfReferences FROM #SSISResults sr JOIN ParamTotals t ON t.PackageName = sr.PackageName AND t.ItemType = sr.ItemType AND t.ItemNamespace = sr.ItemNamespace AND t.ItemName = sr.ItemName; --#endregion Calculate the total number of references to the project parameters, across all packages --__________________________________________________________________________________________________________________________________ --#region Correct the calculated number of references, as needed --Where there are multiple items which begin with the same text (eg, path, paths, path1, path11, ...), the item-counting technique above --does not work correctly (every occurrence of user::path1 would also be counted as an occurrence of user::path, for example) WITH OverCounted AS (SELECT sr.PackageName ,sr.ItemType ,sr.ItemNamespace ,sr.ItemName ,sr.NumberOfReferences FROM #SSISResults sr WHERE EXISTS ( SELECT 1 FROM #SSISResults sr2 WHERE sr.PackageName = sr2.PackageName AND sr.ItemType = sr2.ItemType AND sr.ItemNamespace = sr2.ItemNamespace AND sr.ItemName <> sr2.ItemName AND sr.ItemName = LEFT(sr2.ItemName, LEN(sr.ItemName)) )) ,Deductions AS (SELECT OverCounted.PackageName ,OverCounted.ItemType ,OverCounted.ItemNamespace ,OverCounted.ItemName ,Deduction = SUM(sr3.NumberOfReferences) FROM OverCounted JOIN #SSISResults sr3 ON sr3.PackageName = OverCounted.PackageName AND sr3.ItemType = OverCounted.ItemType AND sr3.ItemNamespace = OverCounted.ItemNamespace AND LEFT(sr3.ItemName, LEN(OverCounted.ItemName)) = OverCounted.ItemName AND sr3.ItemName <> OverCounted.ItemName GROUP BY OverCounted.PackageName ,OverCounted.ItemType ,OverCounted.ItemNamespace ,OverCounted.ItemName) UPDATE sr4 SET sr4.NumberOfReferences = sr4.NumberOfReferences - d.Deduction FROM #SSISResults sr4 JOIN Deductions d ON d.PackageName = sr4.PackageName AND d.ItemType = sr4.ItemType AND d.ItemNamespace = sr4.ItemNamespace AND d.ItemName = sr4.ItemName; --#endregion Correct the calculated number of references, as needed --__________________________________________________________________________________________________________________________________ SELECT * FROM #SSISResults sr --WHERE sr.NumberOfReferences = 0 ; END TRY BEGIN CATCH THROW; END CATCH;