The following SSIS package can be modified to transform and shred any XML document, provided you create the appropriate XSL stylesheet(s) for the transformation(s), and construct the required XQuery SQL code to shred the desired records from the transformed file for insertion into SQL Server.
The XmlTransformation.dtsx SSIS package in the attached demo file archive contains two versions of the Script Task, one written in C# and the other written in VB.NET. The VB.NET version is disabled by default.
Both versions are included so that developers can choose their preferred programming language.
For the purpose of simplifying the following tutorial, the SSIS package we are building will only have one Script Task. The code for both C# and VB.NET is provided so you can create the Script Task using either programming language.
Create and Execute the SSIS Package
The attached xml_transform_demo.zip file contains four demo files:
- XmlTransformation.dtsx - The completed SSIS package containing both C# and VB.NET Script Tasks.
(If you already know how to write SSIS packages, you might want to skip most of this tutorial and go straight to opening the XmlTransformation.dtsx file in Visual Studio or BIDS.) - StockMarketDataAsAttributes.xml - The source XML document to be transformed. This document contains all its data as attributes.
- TransformToElements.xsl - A stylesheet that will convert all the attributes in an XML document to elements.
- FormatXml.xsl - A stylesheet that will justify and indent all the elements in an XML document.
Extract the XML and XSL files into the 'C:\xml\' directory.
If you are addressing an instance of SQL Server on a remote computer, you must copy the files to a shared folder on that computer and use the fully-qualified UNC paths, ie. '\\ComputerName\SharedFolder\FileName.'
Create an SSIS package with a Script Task ('Transform and format xml file'), an Execute SQL Task ('Shred xml file to table'), a Precedence Constraint connecting the two tasks, and five global package variables as shown below.
Double-click on the 'Transform and format xml file' Script Task, select Microsoft Visual C# 2008 or Microsoft Visual Basic 2008 as the ScriptLanguage,...
...then click on the button that appears at the end of the ReadOnly Variables line when you click on the line.
Select the check boxes of the five user variables we previously created, then click the OK button.
Confirm that the ReadOnlyVariables have been added, then click the Edit Script button.
Strip out all the comments and replace the '// TODO: Add your code here' comment in the Microsoft Visual C# 2008 editor window...
...with the following C# code...
string TransformTemplate = Dts.Variables["transform_template"].Value.ToString(); string FormatTemplate = Dts.Variables["format_template"].Value.ToString(); string SourceXml = Dts.Variables["source_xml"].Value.ToString(); string IntermediateXml = Dts.Variables["intermediate_xml"].Value.ToString(); string DestinationXml = Dts.Variables["destination_xml"].Value.ToString(); XslCompiledTransform myXslTransform = default(XslCompiledTransform); myXslTransform = new XslCompiledTransform(); myXslTransform.Load(TransformTemplate); myXslTransform.Transform(SourceXml, IntermediateXml); myXslTransform = new XslCompiledTransform(); myXslTransform.Load(FormatTemplate); myXslTransform.Transform(IntermediateXml, DestinationXml); if (System.IO.File.Exists(IntermediateXml) == true) { // System.IO.File.Delete(IntermediateXml); }
...and add the following two lines to the C# namespace declarations at the top of the file.
using System.Xml; using System.Xml.Xsl;
For a VB.NET Script Task, strip out all the comments and replace the ' Add your code here' comment in the Microsoft Visual Basic 2008 editor window...
...with the following VB.NET code...
Dim TransformTemplate As String = Dts.Variables("transform_template").Value.ToString Dim FormatTemplate As String = Dts.Variables("format_template").Value.ToString Dim SourceXml As String = Dts.Variables("source_xml").Value.ToString Dim IntermediateXml As String = Dts.Variables("intermediate_xml").Value.ToString Dim DestinationXml As String = Dts.Variables("destination_xml").Value.ToString Dim myXslTransform As XslCompiledTransform myXslTransform = New XslCompiledTransform() myXslTransform.Load(TransformTemplate) myXslTransform.Transform(SourceXml, IntermediateXml) myXslTransform = New XslCompiledTransform() myXslTransform.Load(FormatTemplate) myXslTransform.Transform(IntermediateXml, DestinationXml) If System.IO.File.Exists(IntermediateXml) = True Then 'System.IO.File.Delete(IntermediateXml) End If
...and add these two statements to the namespace Imports section.
Imports System.Xml Imports System.Xml.Xsl
Save and close the Integration Services Script Task editor window, then close the Script Task Editor by clicking the OK button.
Right-click on the Connection Managers panel and choose New ADO.NET Connection...
...then click on the New button of the Configure ADO.NET Connection Manager screen.
Enter the server and database names and click OK.
Confirm that the desired ADO.NET connection manager has been added to the Data connections list and click the OK button.
Confirm that the Connection Manager has been added to the Connection Managers panel of Visual Studio, then double-click on the 'Shred xml file to table' Execute SQL Task...
...to bring up the Execute SQL Task Editor.
Change ConnectionType to ADO.NET and the Connection to TestDB, then click on the button that appears at the end of the SQLStatement line when you click on the line...
...to bring up the Enter SQL Query window.
Enter the following SQL code into the Enter SQL Query text box.
DECLARE @DynamicSql VARCHAR(MAX) SET @DynamicSql = N' DECLARE @Doc XML SET @Doc = (SELECT * FROM OPENROWSET(BULK ''' + @destination_xml + ''',SINGLE_BLOB) AS x) BEGIN TRY DROP TABLE [TestDB].[dbo].[Stocks] END TRY BEGIN CATCH END CATCH SELECT * INTO [TestDB].[dbo].[Stocks] FROM ( SELECT X.StockData.query(''Date'').value(''.'',''VARCHAR(10)'') AS ''Date'', X.StockData.query(''Ticker'').value(''.'',''VARCHAR(10)'') AS ''Ticker'', X.StockData.query(''Open'').value(''.'',''VARCHAR(10)'') AS ''Open'', X.StockData.query(''High'').value(''.'',''VARCHAR(10)'') AS ''High'', X.StockData.query(''Low'').value(''.'',''VARCHAR(10)'') AS ''Low'', X.StockData.query(''Close'').value(''.'',''VARCHAR(10)'') AS ''Close'', X.StockData.query(''Volume'').value(''.'',''VARCHAR(10)'') AS ''Volume'' FROM @Doc.nodes(''StockMarketData/StockData'') AS X(StockData) ) sub' EXEC (@DynamicSql)
Then click the OK button.
Click on the Paramater Mapping node.
Add the destination_xml parameter as shown below.
Click the OK button in the Execute SQL Task Editor, save the SSIS package, then execute it.
After the package completes execution, there will be data in the newly-created [TestDB].[dbo].[Stocks] table...
...and two new files in the C:\xml\ folder, Intermediate.xml and StockMarketDataAsElements.xml.
The Transformation Trail
The original StockMarketDataAsAttributes.xml file has, as its name implies, all of its data in the form of attributes embedded inside <StockData> element tags.
The TransformToElements.xsl stylesheet is used to transform StockMarketDataAsAttributes.xml...
...into the file Intermediate.xml, which has all of its data contained within beginning and ending element tags.
The FormatXml.xsl stylesheet used to reformat Intermediate.xml...
...into a document that has all of its elements nested canonically.
Analysis of the Script Task Code
The screen shot below is of the code in the C# Script Task.
The first block of statements in the Main() function...
string TransformTemplate = Dts.Variables["transform_template"].Value.ToString(); string FormatTemplate = Dts.Variables["format_template"].Value.ToString(); string SourceXml = Dts.Variables["source_xml"].Value.ToString(); string IntermediateXml = Dts.Variables["intermediate_xml"].Value.ToString(); string DestinationXml = Dts.Variables["destination_xml"].Value.ToString();
...declares five string variables and loads them with the paths of the demo files contained by the SSIS package variables.
The second block of statements...
XslCompiledTransform myXslTransform = default(XslCompiledTransform); // (1) myXslTransform = new XslCompiledTransform(); // (2) myXslTransform.Load(TransformTemplate); // (3) myXslTransform.Transform(SourceXml, IntermediateXml); // (4)
...(1) declares the variable myXslTransform to be of object type XslCompiledTransform, (2) instantiates the XslCompiledTransform object myXslTransform, (3) loads the XslCompiledTransform instance with the stylesheet to be used for the XSL transformation and (4) executes the transformation on the source file SourceXml and outputs the transformed IntermediateXml file.
The third block of statements...
myXslTransform = new XslCompiledTransform(); // (2) myXslTransform.Load(FormatTemplate); // (3) myXslTransform.Transform(IntermediateXml, DestinationXml); // (4)
...does the same thing, reinstantiating the XslCompiledTransform object and loading the FormatTemplate stylesheet to transform the IntermediateXml file and output the DestinationXml file.
If you want to perform more transformations, you can clone this code and enter the paths for the stylesheets and the input/output XML documents.
The final block of code...
if (System.IO.File.Exists(IntermediateXml) == true) { // System.IO.File.Delete(IntermediateXml); }
...would delete the Intermediate.xml file if it weren't commented-out. If you wish to have the Script Task delete the intermediate file, uncomment this line.
Analysis of the Execute SQL Task Code
The screen shot below is of the same code as in Execute SQL Task, but executed in SQL Server Management Studio so a SELECT statement can be used at the end to illustrate that the data was inserted into the table.
The @destination_xml variable is declared and its value set in the first two lines because the SSIS package variable is not available in Management Studio.
The OPENROWSET(BULK...) procedure can only accept a quoted string constant (ie., 'C:\xml\StockMarketDataAsElements.xml') for the filepath.
It cannot accept a variable or expression (ie., @destination_xml). This is a big problem because we want to pass the Execute SQL Task parameter '@destination_xml' to it.
The workaround for this problem is to use dynamic SQL, which allows us to put a SQL statement inside a variable and execute that statement.
That makes the following two lines from the SQL script above very important.
DECLARE @DynamicSql VARCHAR(MAX) SET @DynamicSql = N'
They tell us that the SQL statement is being stored in a VARCHAR(MAX) parameter.
The maximum size of a VARCHAR(MAX) variable is 2^31-1 bytes, or 2,147,483,647 characters.
VARCHAR(MAX) requires a Unicode value. The characters between N' (N-tilde) and the final tilde designate the contents of a Unicode string.
We can execute the SQL code stored in the VARCHAR(MAX) variable '@DynamicSql' with the EXEC() procedure.
EXEC(@DynamicSql)
The code below illustrates how the @destination_xml parameter is stored in the @DynamicSql variable.
As you can see, the '@destination_xml' variable is replaced with the string 'C:\xml\StockMarketDataAsElements.xml' when the SQL code is stored in the @DynamicSql variable, similar to how a text-substitution macro in C or assembly language works.
In other words, the path is hard-coded into the @DynamicSql variable before it is ever executed with EXEC().
Conclusion
This SSIS package gives you a viable framework for transforming and shredding XML files.
Future installments in this series will cover writing XSL code for the transformation operations and XQuery code for the shredding operations.
In the meantime, the following links can get you started.