Introduction
Every now and then we need to export data as XML. As we know, SSIS does not have out of the box component or task to accomplish this. Now, there are some very usefully components that we can install, but as a consultant I do not always have the freedom to do so. So, usually our only option is to use .NET to generate the required XML.
A quick Google search will return some code you could use in the form of a destination script component, but generally I find the code overly complex and in some cases the code does not generate the correct XML representation for the exported data. Other solutions require you to load your data first in a temporary SQL Server table, and then generate the xml data by using the FOR XML clause. Now this solution works well enough, but for me it always feels awkward to use temporary tables for tasks as this one.
A Simpler Solution
First, I have to confess that I lie in the title. This solution is not a Destination Component. It's a Script Task. But it is really, really simple. So, what solution do I propose to you?
DataTable.WriteXml
That simple line is what we need to generate a correct XML representation for any datatype that can be stored in a DataTable. So first we have to insert our data into a DataTable. How do we accomplish this?
DataTables in SSIS
In SSIS we have two tasks that can deliver DataTables (or DataSets): The SQL Task and the Data Flow Task.
DataTable from SQL Task
To produce a DataTable from a SQL Task we have to configure the task to return a full result set into an object type variable. Depending on the connection type the variable will contain a Recordset object (OLEDB, EXCEL) which we'll use later to create a DataTable, or in the case of ADO.NET connections the task will return a DataSet which contains a DataTable already.
For connections of type ODBC the SQL task would not be useful, as the task will give you an error stating that ODBC does not support disconnected recordsets. So if you have to use ODBC, you have to use a Data Flow Task.
DataTable from Data Flow Task
To get a DataTable object from a Data Flow Task you have to configure a Recordset Destination. Beware that not every column type can be mapped to this destination type, so you'll have to convert some column types to string first.
The Script
Now that we have configured the temporary memory object to hold our rows, it's time to generate the XML file. This script will generate an XML file with format akin to the RAW method of SQL Server.
DataTable's WriteXml method expects a DataSet name to generate the root node label. The DataTable's name will be used to label the rows. By default the document is based on elements, so to generate an attribute based document we have to tweak the code a little bit. Also, by default the columns of type datetime get the time zone appended which might not be what you want.
The final script is longer than the single line I write before, but it is still short and simple enough. In a Script Task, map the object variable as read only variable and replace the Main method with the following code:
public void Main()
{
DataTable dt = null;
var objRows = Dts.Variables["Public::RowSet"].Value;
var ds = objRows as DataSet;
/* The DataSetName property determines the root node name,
while the TableName property determines the row name. */ if (ds != null) //Ado.Net DataSet
{
ds.DataSetName = "rows";
(dt = ds.Tables[0]).TableName = "row";
}
else //OleDb Connections and Recordset Destinations
{
dt = (new DataSet("rows")).Tables.Add("row");
(new OleDbDataAdapter()).Fill(dt, objRows);
}
foreach (DataColumn c in dt.Columns)
{
//Comment the following line to get a documente based on elements
c.ColumnMapping = MappingType.Attribute;
if (c.DataType == typeof(DateTime))
c.DateTimeMode = DataSetDateTime.Unspecified; //No time zone
}
dt.WriteXml(@"C:\Temp\result.xml");
Dts.TaskResult = (int)ScriptResults.Success;
}
Pros
Simple code. You can save it as a snippet and paste it as is whenever you need it. It can format correctly any datatype allowed in a DataTable (try with binary data, it would be serialized as Base64 by default). It can handle correctly any weird name you have in your columns, and produce a valid xml identifier.
It works with all versions since 2008 without changes.
Gotchas
Since DataTables are memory objects, if you have to export huge row sets to XML, the memory consumption of the package will increase notoriously. I have tested and implemented this solution for relatively big XML files (hundreds of MB), but I would not dare to export huge data sets with this technique.
According to the MSDN, the DataTable object has a hard limit of 16,777,216 for the maximum number of rows it can hold, but I'm sure you'll hit some memory limit way before that, depending on the number of columns in your set.
As with every piece of code you find in the Internet, test it and then test it again.
References
1.- DataTable Class
2.- Out of memory exception ado.net dataset
3.- The first time I realize I could read datatables in a script task was by reading a blog from Jamie Thompson many, many years ago, but I cannot find the article / blog anywhere now. So I think it's fare to mention him here.