Introduction
One of the many useful features gained by SQL Server 2000 is the ability to handle XML documents.
This means that any computer language capable of opening an XML files and calling SQL Server stored procedures can make use of this new ability.
Data types used
SQL Server can use any of the data types used to hold character information.
TEXT and NTEXT cannot be used as local variables and can only be passed as arguments to a stored procedure.
Bare in mind that the Unicode data types consume twice as much memory as the ASCII equivalents.
ASCII Type | Unicode Type | Max Size | Comment |
---|---|---|---|
CHAR | NCHAR | 8000 | Fixed length |
VARCHAR | NVARCHAR | 8000 characters | Variable length |
TEXT | NTEXT | 2Gb | May require you to use SET TEXTSIZE as the @@TEXTSIZE parameter determines how much data is returned from a TEXT/NTEXT field by a SELECT statement. |
SQL Commands Used.
SQL Command | Comment |
---|---|
sp_xml_preparedocument | Parses the XML document using MSXML 2.0. WARNING: - This will use 1/8th of the memory available to SQL Server. If the procedure is successful it outputs a handle to the XML document as stored in the SQL Server cache. |
sp_xml_removedocument | Removes the XML document from the cache and destroys the handle. |
OPENXML | Returns the rowset from the prepared XML document. |
In the same way that any C++ object created should always be explicitly destroyed and any pointer to a VB object should always have the equivalent SET <obj>=Nothing so you should never use sp_xml_preparedocument in a procedure without using sp_xml_removedocument.
Example document
The example document contains an XML schema used within a VB program to draw a form for maintaining web page metadata.
The XML document is written out as a file by a content management system (CMS) in order to allow off-line editing.
The off-line editing process outputs the amended XML file that can then be read and the salient portions written back out into the CMS.
<?xml version="1.0"?>
<Page>
<AuthorsForm>
<MetaDataItem MaxSize="100">
<Name>Class Number</Name>
<Value><![CDATA[null]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please enter a Class Number value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="100">
<Name>Title</Name>
<Value><![CDATA[Prisoners: getting further help]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>True</Editable>
<ToolTip>Please enter a Title value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="50">
<Name>Author</Name>
<Value><![CDATA[David Poole]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please enter a Author value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="50">
<Name>Translator</Name>
<Value><![CDATA[Renee Arnoux]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>True</Editable>
<ToolTip>Please enter a Translator value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="100">
<Name>Subject</Name>
<Value><![CDATA[Civil Rights]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please enter a Subject value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="200">
<Name>Description</Name>
<Value><![CDATA[Prisoners: getting further help]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please enter a Description value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="100">
<Name>Publisher</Name>
<Value><![CDATA[HMSO]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please enter a Publisher value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="10">
<Name>Date</Name>
<Value><![CDATA[2002-1-09]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please enter a Date value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="150">
<Name>Resource Type</Name>
<Value><![CDATA[Document]]></Value>
<Type>text</Type>
<CTRLType>combobox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please enter a Resource Type value.</ToolTip>
</MetaDataItem>
<MetaDataItem MaxSize="100">
<Name>Resource Identifier</Name>
<Value><![CDATA[prisoners_getting_further_help]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please enter a Resource Identifier value.</ToolTip>
</MetaDataItem>
</AuthorsForm>
</Page>
Querying the document
Case sensitivity
The most important point to note is that as OPENXML is querying an XML document then regardless of the settings within SQL Server the query is case sensitive.
Replacing Page/AuthorsForm/MetaDataItem with page/authorsform/metadataitem will not work. Neither will changing the case of the field names within the WITH statement.
Using a stored procedure
Let us suppose that you have a stored procedure defined as follows.
CREATE PROC usp_ParseXML @stXML NTEXT AS
DECLARE @iDocumentHandle Int
exec sp_xml_preparedocument @iDocumentHandle OUTPUT, @stdoc
SELECT *
FROM OPENXML (@iDocumentHandle , 'Page/AuthorsForm/MetaDataItem',3)
WITH (Name VARCHAR(50) , Value VARCHAR(50), MaxSize CHAR(3) )
exec sp_xml_removedocument @iDocumentHandle
1. We declare the variable @iDocumentHandle to hold the handle for our XML document.
2. We tell SQL Server to prepare our document.
3. Our OPENXML statement accepts three arguments.
Ø @iDocumentHandle containing our handle to our document.
Ø The XPATH to our metadata items.
Ø An argument to say whether or not we are interested in Attributes or Elements.
4. Our WITH clause says which elements and/or attributes we are interested in.
5. We tell SQL Server to remove our document from the cache and destroy the pointer.
Just for the sake of argument, to call our stored procedure we will have done something like the following
DECLARE @stXML VARCHAR(8000)
SET @stDoc=
'<?xml version="1.0"?>
<Page> …..etc
exec usp_ParseXML @stDoc
In this case we would get the results set as follows
Name | Value | MaxSize |
---|---|---|
Class Number | null | 100 |
Title | Prisoners: getting further help | 100 |
Author | David Poole | 50 |
Translator | Renee Arnoux | 50 |
Subject | Civil Rights | 100 |
Description | Prisoners: getting further help | 200 |
Publisher | HMSO | 100 |
Date | 2002-1-09 | 10 |
Resource Type | Document | 150 |
Resource Identifier | prisoners_getting_further_help | 100 |
Arguments of the OpenXML clause
The OPENXML statement allows all of its arguments to be passed as parameters.
This means that the xpath and flags themselves can be passed as parameters. This gives you considerable flexibility however as the WITH statement cannot accept parameters this means that you will probably have to accept output as an EDGE table (see Books Online).
xpath
As the querying of the document is not handled by SQL Server but via MS XML the xpath argument can be any valid xpath supported by the MS XML API. Remember that this is MSXML2.
Page/AuthorsForm/MetaDataItem[*="null"] will return any metadata item where any child element has a value of null.
Page/AuthorsForm/MetaDataItem[Name="Title"] will return any metadata item where the Name element is set to Title.
Flags
In our example XML document Name and Value are elements where as MaxSize is an attribute.
The 3rd parameter of OPENXML tells SQL Server what we are interested in.
The parameter is a bit flag so we can combine flags to tell SQL Server to return both, which is what we have done in the preceding example.
The flag settings are as follows.
Value | Meaning |
---|---|
0, 1 or omitted | Return only the attributes. |
2 | Return only the elements. |
3 (combination of 1 and 2) | Return both attributes and elements |
This has the following affect on our results.
Flags set to 0, 1 or omitted
Here we can see that only the MaxSize attribute has been returned.
Name | Value | MaxSize |
---|---|---|
NULL | NULL | 100 |
NULL | NULL | 100 |
NULL | NULL | 50 |
NULL | NULL | 50 |
NULL | NULL | 100 |
NULL | NULL | 200 |
NULL | NULL | 100 |
NULL | NULL | 10 |
NULL | NULL | 150 |
NULL | NULL | 100 |
Flags set to 2
Here we can see that the Name and Value elements have been returned but the MaxSize element has been ignored.
Name | Value | MaxSize |
---|---|---|
Class Number | NULL | NULL |
Title | Prisoners: getting further help | NULL |
Author | David Poole | NULL |
Translator | Renee Arnoux | NULL |
Subject | Civil Rights | NULL |
Description | Prisoners: getting further help | NULL |
Publisher | HMSO | NULL |
Date | 2002-1-09 | NULL |
Resource Type | Document | NULL |
Resource Identifier | prisoners_getting_further_help | NULL |
The WITH statement
The WITH statement can be omitted, in which case you will receive what is known as an EDGE table (see books online).
Where it is used the WITH statement can contain field definitions or the name of a table containing the field definitions.
As mentioned earlier the names of the fields must match the names of the elements and are case sensitive.
If you want the fieldnames retrieved to be something other than those in the XML document then the field definitions themselves have an xpath portion.
For example, let us suppose that we want to return the MaxSize attribute as TextBoxSize, then we would amend our WITH statement as follows.
WITH(Name VARCHAR(50) , Value VARCHAR(50), TextBoxSize CHAR(3) 'MaxSize')
Different levels of the XML schema
Within the WITH clause the xpath portion of the field definition also allows you to reference the ancestors or parents of the elements you are referencing.
The example given in books online shows a hierarchy of
Ø Customers
o Order Headers
§ Order Details
We would query the Order Details line but our xpath arguments would allow us to retrieve information from the Order Header and Customer level of the document.
As a live example let us look at an XML document used to update content within a CMS system.
<?xml version="1.0"?>
<Page>
<SlotBlock>
<SlotBlockStyle>heading3</SlotBlockStyle>
<SlotBlockContent>
<PreObject>This is my heading</PreObject>
<PostObject />
<ObjectID>-1</ObjectID>
<ObjectType>B</ObjectType>
</SlotBlockContent>
</SlotBlock>
<SlotBlock>
<SlotBlockStyle>Paragraph</SlotBlockStyle>
<SlotBlockContent>
<PreObject><![CDATA[This is my paragraph]]></PreObject>
<PostObject />
<ObjectID>0</ObjectID>
<ObjectType />
</SlotBlockContent>
<SlotBlockContent>
<PreObject><![CDATA[This is another paragraph]]></PreObject>
<PostObject />
<ObjectID>0</ObjectID>
<ObjectType>T</ObjectType>
</SlotBlockContent>
</SlotBlock>
</Page>
A <SlotBlock> is a paragraph.
<SlotBlockStyle> is the paragraph style.
<SlotBlockContent> identifies a piece of text within the paragraph.
We want to query the elements within the SlotBlockContent elements so we would write the following.
DECLARE @stDoc VARCHAR(8000)
DECLARE @stxPath VARCHAR(50)
DECLARE @iFlags Int
DECLARE @iDocumentHandle Int
SET @iFlags=2
SET @stxPath='Page/SlotBlock/SlotBlockContent'
SET @stDoc=……
Our example document
exec sp_xml_preparedocument @iDocumentHandle OUTPUT, @stdoc
SELECT *
FROM OPENXML (@iDocumentHandle , @stxpath,@iFlags)
WITH( PreObject VARCHAR(50) ,
PostObject VARCHAR(50),
ObjectType CHAR(1) )
exec sp_xml_removedocument @iDocumentHandle
This would return the following data
PreObject | PostObject | ObjectType |
This is my heading | B | |
This is my paragraph | ||
This is another paragraph | T |
If we want to grab the <SlotBlockStyle> element from the level above then we would have to add the xpath for the element as shown by the code as marked in bold below.
DECLARE @stDoc VARCHAR(8000)
DECLARE @stxPath VARCHAR(50)
DECLARE @iFlags Int
DECLARE @iDocumentHandle Int
SET @iFlags=2
SET @stxPath='Page/SlotBlock/SlotBlockContent'
SET @stDoc=……
Our example document
exec sp_xml_preparedocument @iDocumentHandle OUTPUT, @stdoc
SELECT *
FROM OPENXML (@iDocumentHandle , @stxpath,@iFlags)
WITH( ParagraphStyle VARCHAR(50) '../SlotBlockStyle'
PreObject VARCHAR(50) ,
PostObject VARCHAR(50),
ObjectType CHAR(1) )
exec sp_xml_removedocument @iDocumentHandle
This would return the following data
ParagraphStyle | PreObject | PostObject | ObjectType |
---|---|---|---|
heading3 | This is my heading | B | |
Paragraph | This is my paragraph | ||
Paragraph | This is another paragraph | T |
The "../" in the xpath definition for ParagraphStyle tells SQL Server to jump up a level to retrieve SlotBlockStyle.
If we added "../../" this would mean jump up two levels to read a specified element.
Additional Information
Finally, let us suppose that there were other elements within <SlotBlockContent> that we didn't know anything about.
We could return information on this using an overflow directive.
WITH( ParagraphStyle VARCHAR(50) '../SlotBlockStyle'
PreObject VARCHAR(50) ,
PostObject VARCHAR(50),
ObjectType CHAR(1) ,
Overflow VARCHAR(250) '@mp:xmltext' )
The @mp:xmltext field is a special field that will return the parent element and any child elements that have not been explicitly declared in the WITH statement.
If there are no additional child elements then @mp:xmltext will simply return an empty parent element, in this case <SlotBlockContent/>