For today’s XML workshop, we will be reading and importing datasets from multiple XML files containing relational data. Traditional methods of transferring large datasets are still prevalent in today’s connected world. Every database administrator will inevitably be required to script out or build an ad hoc extract, transform and load process to migrate one or multiple datasets from formatted files to their managed environment. Some will even have the luxury of dealing with industry-standard file formats, such as XML. Let us explore a quick and easy method of importing XML files to a similarly structured user table.
Importing XML Data Sources
The following are two datasets formatted in XML. Each includes three patients, and all six patients have a unique identification number that will be loaded into a user table. For the purpose of this example, the identification numbers have a one-to-one relationship with the primary key of our destination. You may copy and paste these examples into files and store them in a working directory ( C:\XML\ ) on the server hosting your SQL Server instance.
[xml-example_01.xml]
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <root xmlns:xs="http://www.w3.org/2001/XMLSchema"> <patient> <id>1</id> <name>Bob</name> <birthdate>1999-01-10</birthdate> </patient> <patient> <id>2</id> <name>Sam</name> <birthdate>1989-05-11</birthdate> </patient> <patient> <id>3</id> <name>Richard</name> <birthdate>1990-12-01</birthdate> </patient> </root>
[xml-example_02.xml]
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <root xmlns:xs="http://www.w3.org/2001/XMLSchema"> <patient> <id>4</id> <name>Matt</name> <birthdate>1991-01-13</birthdate> </patient> <patient> <id>5</id> <name>Joe</name> <birthdate>1982-06-14</birthdate> </patient> <patient> <id>6</id> <name>Test Patient</name> <birthdate>1993-12-09</birthdate> </patient> </root>
Before importing from the XML sources, verify that the contents of each file is in a valid format. Execute the following T-SQL query to extract the XML contents and convert the binary result into a readable XML string.
SELECT CONVERT(XML, BulkColumn) [XmlSource] FROM OPENROWSET(BULK 'C:\XML\xml-example_01.xml', SINGLE_BLOB) t
The OPENROWSET() table function may be used as an ad hoc method of creating an OLE DB data connection to another database or, in our case, an external file. For today’s example, we are accessing the file using a BULK operation in order to efficiently pull the contents of the file into memory. Should the XML be malformed or contain an invalid character, the conversion will fail with an error message like the two examples below :
Msg 9436, Level 16, State 1, Line 2
XML parsing: line 17, character 7, end tag does not match start tag
or
Msg 9455, Level 16, State 1, Line 2
XML parsing: line 5, character 4, illegal qualified name character
When working with source files in a production environment I suggest storing the XML string in an archive for auditing purposes. This will allow for easy retrieval in the future. The additional step is especially useful should you be applying any data transformations during the load process.
Next, read the parsed XML string.
DECLARE @idoc INT DECLARE @result XML SELECT @result = CONVERT(XML, BulkColumn) FROM OPENROWSET(BULK 'C:\XML\xml-example_01.xml', SINGLE_BLOB) t EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @result; SELECT * FROM OPENXML(@idoc, 'root/patient', 2) WITH ( id INT, name VARCHAR(200), birthdate DATETIME ) t
The result set ought to look familiar with a defined schema :
id name birthdate
---- ---------- -----------------------
1 Bob 1999-01-10 00:00:00.000
2 Sam 1989-05-11 00:00:00.000
3 Richard 1990-12-01 00:00:00.000
Let us examine the code. First, an XML variable ( @result
) is declared to store the contents of the XML source file and then parsed into memory using the sp_xml_preparedocument
system stored procedure. An integer variable ( @idoc
) is declared as well to store a session-specific identity that points to the internally parsed XML document. The pointer is used by the OPENXML() table function that allows the user to easily define a schema for the XML.
Notice that in our example we use the “element-centric” option ( 2 ) when parsing the XML at the ‘root/patient’ path with the OPENXML()
table function. Our example source files use only text node values. Should the examples use only attribute values, we would need to use the “attribute-centric” option ( 1 ). Mixed mode ( 8 ) is beyond the scope of this workshop, but I suggest reading more about schema declaration with x-path definitions on MSDN.
With either element- or attribute-centric parsing we need to define an output schema. Our example source files follow a simple and balanced XML schema. In other words, each parent node ( [patient]
) has the same child nodes ( [id]
, [name]
, and [birthdate]
). The script above translates each child node as a field, which we are going to import into a table with a matching schema.
CREATE TABLE [dbo].[Test] ( [Id] [int] NOT NULL , [Name] [varchar](200) NULL , [BirthDate] [datetime] NULL , PRIMARY KEY CLUSTERED ( [Id] ASC ) )
Run the query above to create a user table that matches the XML source, and then execute the script below to import the contents of the XML file.
DECLARE @idoc INT DECLARE @result XML SELECT @result = CONVERT(XML, BulkColumn) FROM OPENROWSET(BULK 'C:\XML\xml-example_01.xml', SINGLE_BLOB) t EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @result; INSERT INTO [dbo].[Test] ( [Id] , [Name] , [BirthDate] ) SELECT [id] , [name] , [birthdate] FROM OPENXML(@idoc, 'root/patient', 2) WITH ( id INT, name VARCHAR(200), birthdate DATETIME ) t EXEC sys.sp_xml_removedocument @idoc;
The query will read the results into the user table. Notice that sp_xml_removedocument
is executed at the end of the transaction. The system stored procedure will remove the parsed XML data from memory and prevent resource starvation when using multiple large XML files in a single session. Once complete, execute the following script to import the second file :
DECLARE @idoc INT DECLARE @result XML SELECT @result = CONVERT(XML, BulkColumn) FROM OPENROWSET(BULK 'C:\XML\xml-example_02.xml', SINGLE_BLOB) t EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @result; INSERT INTO [dbo].[Test] ( [Id] , [Name] , [BirthDate] ) SELECT [id] , [name] , [birthdate] FROM OPENXML(@idoc, 'root/patient', 2) WITH ( id INT, name VARCHAR(200), birthdate DATETIME ) t EXEC sp_xml_removedocument @idoc;
Next, run a select query on the user table.
SELECT [Id] , [Name] , [BirthDate] FROM [dbo].[Test]
Results :
Id Name BirthDate
---- ----------------- -----------------------
1 Bob 1999-01-10 00:00:00.000
2 Sam 1989-05-11 00:00:00.000
3 Richard 1990-12-01 00:00:00.000
4 Matt 1991-01-13 00:00:00.000
5 Joe 1982-06-14 00:00:00.000
6 Test Patient 1993-12-09 00:00:00.000
All patients were successfully loaded into the new user table with unique primary key values. If the table had an identity column, setting the IDENTITY_INSERT
option ON
prior to the load will allow a full import from the source files, identities and all, should no other record exist with the same primary key value. Do note that there are limitations to this method of importing an XML source. The script above will only import one file at a time, and the OPENXML()
table function will only read up to 250 elements. Overcoming these limitations will require moving into SSIS ETL builds, PowerShell scripting, or application development.