Overview
Over the years, XML has grown widely in popularity and has become a popular format for exchanging data. Regardless of our preferences, most of us have been in situations where we had to import data from XML files. SQL Server provides options like OpenXML to do this task.
I work in both the Microsoft and the Unix worlds and once came across a powerful open source XML manipulation utility called XMLStarlet. In this article let us explore how we can use XMLStarlet to import XML data from files into SQL Server tables.
XMLStarlet is an open source command line utility available on Sourceforge at http://xmlstar.sourceforge.net/. It can be used to perform a variety of operations on XML data, all from the command line. One of them is selecting data. This feature has been utilized here to import data from XML files into a table of our choice by building a wrapper T-SQL stored procedure.
Getting XMLStarlet
The XMLStarlet toolkit is an open source project hosted on Sourceforge.net. The download page shows versions for multiple operating systems. We need to download the file xmlstarlet-1.0.1-win32.zip from the download page on the site http://xmlstar.sourceforge.net/. Extract the Zip file to a folder, say C:\XML\xmlstarlet-1.0.1, and we are ready to start. No additional steps are required. The executable of our interest is the xml.exe. The zip package includes a detailed PDF help file which shows how to perform various operations using the toolkit. Of the various functions it is capable of, we will only use the Select functionality.
The sample input files and the code for the stored procedures is attached in the Resource Files section of this article.
XML File:
<WildLife>
<Creature type="Mammal">
<Name>Cheetah</Name>
<Name>Big Fast Cat</Name>
<Color>Yellow with black spots</Color>
</Creature>
<Creature type="Mammal">
<Name>Brown Bear</Name>
<Color>Brown</Color>
</Creature>
<Creature type="Reptile">
<Name>Lizard</Name>
<Name>Gecko</Name>
<Color>Green</Color>
</Creature>
<Creature type="Insect">
<Name>Spider</Name>
<Name>Arachnid</Name>
<Color>Black</Color>
</Creature>
<Creature type="Amphibian">
<Name>Turtle</Name>
<Name>Terrapin</Name>
<Name>Turtle</Name>
<Color>Grey</Color>
</Creature>
</WildLife>
Command Syntax:
xml sel -T -t -m "/WildLife/Creature" -v "position()" -o "," -v "Name[position()=1]" -o "," -v "@type" -o "," -c "Color" -n WildLife.xml
When we run the above command line on the WIndows prompt we get the following output:
Output:
1,Cheetah,Mammal,Yellow with black spots
2,Brown Bear,Mammal,Brown
3,Lizard,Reptile,Green
4,Spider,Insect,Black
5,Turtle,Amphibian,Grey
The above CSV output can now be imported into a table through standard SQL Server utilities like BCP or BULK INSERT statement.
The trick is to flatten the XML file to a delimited file and then bulk insert the required table with standard utilities. For this purpose we use the two stored procedures attached in the Resources section of this article. In order to import an XML file into a table we can use the stored procedure usp_ImportXML.
Solution
Let us say we need to upload the WildLife.xml file into a table called WildLife. The structure of the WildLife table is:
CREATE TABLE WildLife (
Serial_No INT,
Creature_Name VARCHAR(100),
Creature_Type VARCHAR(100),
Creature_Color VARCHAR(100)
)
The steps below describe how we can import the XML file into this table using the usp_ImportXML stored procedure.
Step 1: Prepare a parameter table
The usp_ImportXML stored procedure looks for a temporary table called #tblXMLParams. This temp table specifies the record-defining node, the delimiter to be used to prepare the intermediate delimited file, the nodes, attributes or XPath expressions to be selected from the XML file and their order. In the above example, to get the CSV output as shown we create the temp table as follows:
IF NOT object_id('tempdb..#tblXMLParams') IS NULL
BEGIN
DROP Table #tblXMLParams
END
CREATE TABLE #tblXMLParams (XPathOrder int, NodeType VARCHAR(20), XPathExpression VARCHAR(500))
INSERT INTO #tblXMLParams VALUES (0,'Record_Node','/WildLife/Creature')
INSERT INTO #tblXMLParams VALUES (0,'Delimiter','|')
INSERT INTO #tblXMLParams VALUES (1,'XPATH','position()')
INSERT INTO #tblXMLParams VALUES (2,'node','Name[position()=1]')
INSERT INTO #tblXMLParams VALUES (3,'Attribute','@type') INSERT INTO #tblXMLParams VALUES (4,'node','Color')
The following values for NodeType are supported:
1. Record_Node - This is the tag that demarcates the start of a record and would repeat throughout the XML file. It may contain any number of child tags, attributes and nested child tags.
2. Delimiter - This is the delimiter to use instead of the default delimiter - comma. Specifying a delimiter other than comma is useful in situations where we know that the text we are importing itself contains commas. A good delimiter is the pipe - '|'.
3. XPATH - This could be any XPath expression. Here you can use the full power of XPATH
4. Node - This type specifies the name of the node from which the text is to be selected. XPath can be combined here. For example let us say our XML file had Creature tags as shown below:
<Creature type="Mammal">
<Name>Cheetah</Name>
<Name>Big Cheetah</Name>
<Color>Yellow with black spots</Color>
</Creature>
If we want to select only the last Name tag from each Creature, we can use:
INSERT INTO #tblXMLParams VALUES (2,'node','Name[position()=last()]')
Hence we can use the full power of XPATH to select the exact data needed from the file.
5. Attribute - This type specifies the attribute whose value is to be selected
The temp table #tblXMLParams is used by the stored procedure usp_ReadXMLParams to construct the command line parameters to be supplied to the XMLStarlet utility. The procedure usp_ReadXMLParams is internally used by the usp_ImportXML procedure and we don't need to use it directly for the task of importing XML data into tables. Still, if you would like to the output of usp_ReadXMLParams, use the code snippet below:
Declare @strXMLStarletCmd VARCHAR(2000) exec usp_ReadXMLParams @strXMLStarletCmd Print @strXMLStarletCmd
Why do we use this temp table? Why not a string supplied as an input parameter? Because being a generic solution, we do not know beforehand the structure of the file we need to import. If strings are used, we may run into a situation where the XPATH expressions are too long. Some people may prefer to keep the data selection parameters in INI files or in permanent tables. They would have to write T-SQL code to create the input string in order to use the usp_ImportXML SP. The temp table seems a good solution for all the above situations. We can populate the temp table with a minimal amount of code. That's a short note about the rationale behind using this temp table. Lets go ahead now.....
Step 2: Execute the stored procedure
Once the temp table is created we are ready to run our stored procedure. The usp_ImportXML procedure takes the following parameters:
1. @strXMLFileName This is the path of the XML file we want to import
2. @strTableName This is the table to which we want to import the XML data into
3. @blnReplace This is an optional Bit variable used to control whether we want to append data or delete existing data and import the new data into the table. Default behavior is to append the data to the existing table.
4. @strXMLStarletPath This optional variable contains the path to the XMLStarlet executable. If it is not supplied, the procedure uses the path which is coded in the usp_ImportXML stored procedure. I have coded the path as 'C:\XML\xmlstarlet-1.0.1\' as that's the place I unzipped the XMLStarlet package.
5. @strDelimiter We can use this optional variable to specify a different delimiter than a comma. If it is not supplied, the delimiter is assumed to be a comma (,).
To import our WildLife.xml XML file into the WildLife table, we use the stored procedure in one of the required formats as shown below:
Case I
Uses default text delimiter (comma)
Data to be appended to the existing table
Declare @strXMLFile Varchar(50), @strTable Varchar(50) Select @strXMLFile = 'C:\XML\WildLife.xml', @strTable = 'WildLife' exec usp_ImportXML @strXMLFileName=@strXMLFile, @strTableName=@strTable
Case II
Uses default text delimiter (comma)
Existing data to be deleted and new data added to the table
Declare @strXMLFile Varchar(50), @strTable Varchar(50), @blnOverwrite Bit Select @strXMLFile = 'C:\XML\WildLife.xml', @strTable = 'WildLife', @blnOverwrite = 1 exec usp_ImportXML @strXMLFileName=@strXMLFile, @strTableName=@strTable, @blnReplace=@blnOverwrite
Case III
Uses pipe delimiter to internally split the text. This feature is useful when the data itself contains commas as part of the text.
Existing data to be deleted and new data added to the table
Declare @strXMLFile Varchar(50), @strTable Varchar(50), @blnOverwrite Bit, @strTextDelimiter Varchar(1) Select @strXMLFile = 'C:\XML\WildLife.xml', @strTable = 'WildLife', @blnOverwrite = 1, @strTextDelimiter = '|' exec usp_ImportXML @strXMLFileName=@strXMLFile, @strTableName=@strTable, @blnReplace=@blnOverwrite, @strDelimiter=@strTextDelimiter
One thing to note here is that we need to use the same delimiter as was used while creating the #tblXMLParams temp table in Step 1. So if we plan to use a text delimiter other than the comma, we need to specify it in both places:
1. We should add the below statement while creating the parameter temp table
INSERT INTO #tblXMLParams VALUES (0,'Delimiter','|')
2. Pass the same delimiter as the @strDelimiter parameter to the usp_ImportXML stored procedure.
Case IV
Uses pipe delimiter to internally split the text. This feature is useful when the data itself contains commas as part of the text.
Path to the XMLStarlet exe specified as an input parameter
Existing data to be deleted and new data added to the table
Declare
@strXMLFile Varchar(50),
@strTable Varchar(50),
@blnOverwrite Bit, @strTextDelimiter Varchar(1), @strXMLExePath Varchar(100)
Select @strXMLFile = 'C:\XML\WildLife.xml', @strTable = 'WildLife', @blnOverwrite = 1, @strTextDelimiter = '|', @strXMLExePath='C:\XML\xmlstarlet-1.0.1'
exec usp_ImportXML @strXMLFileName=@strXMLFile, @strTableName=@strTable, @blnReplace=@blnOverwrite, @strDelimiter=@strTextDelimiter, @strXMLStarletPath=@strXMLExePath
The file XML_Import_Usage.txt attached in the Resource Files section shows the complete usage code for Case IV.
Conclusion
The XML Bulk Import solution presented in this article can be used to easily import XML data from files into database tables. The tables could be real tables or temporary tables. In addition to making the import process simple, this solution also allows us to use XPath expressions to import just the exact data required. So add this to your tool belt and make the task of importing XML files easier.