This is the second part of a 3 part blog which will attempt to show some different ways of how you can use the xml modify() method to update the contents of an XML document. This part, starting at the absolute basics, will cover the ability to update data in an existing XML document
XML DML is an extension of the XQuery language that allows us to do in place updates of the contents of an XML document. XML DML is very flexible and most scenarios can be dealt with quite nicely with the use of the modify() function. We can however only use the modify() function as part of an UPDATE or SET statement.
Here is a very simple example XML document that we will be using with all the example queries during this article which can downloaded with all of the example queries here.
--Declare/assign our example xml blob DECLARE @XMLData XML SET @XMLData = ' <Person Id="1234"> <Fullname/> <Surname>Smith</Surname> <Forenames>John Peter</Forenames> <Address HasPostcode=""> <AddressLine>1 Church Lane</AddressLine> <AddressLine>Littlewood</AddressLine> <AddressLine>Upper Westshire</AddressLine> <AddressLine>England</AddressLine> </Address> <HasDrivingLicense/> <HasPostcode>Unknown</HasPostcode> </Person>'
The Basics
The replace value of keyword is fairly simplistic in functionality. It allows for a single element or attribute node defined by an expression to be updated with an explicit value or value derived from an expression. It is not possible to update a PI node or a comment node. MSDN detail the syntax as:
replace value of Expression1 with Expression2
This first example will update the <Surname> element with a new static value. Because this is untyped XML, and we are updating the value of a simple type node, we need to explicitly specify that we want to update the text node of /Person/Surname. If you didn't you would see something like:
The target of 'replace value of' must be a non-metadata attribute or an element with simple typed content
Also because a singleton is expected, we need to use the [1] to denote the first text node occurrence, even though there is only a single text node to update. If you omitted this, you would see an error similar to:
The target of 'replace' must be at most one node, found 'text *'
SET @XMLData.modify(' replace value of (/Person[1]/Surname[1]/text()[1]) with ("Jones") ') SELECT @XMLData
Returns:
<Person Id="1234"> <Fullname /> <Surname>Jones</Surname> <Forenames>John Peter</Forenames> <Address HasPostcode=""> <AddressLine>1 Church Lane</AddressLine> <AddressLine>Littlewood</AddressLine> <AddressLine>Upper Westshire</AddressLine> <AddressLine>England</AddressLine> </Address> <HasDrivingLicense /> <HasPostcode>Unknown</HasPostcode> </Person>
The example updates the first and only text node within <Surname> with the new value of "Jones" by specifying text()[1], but you could have multiple text nodes. Consider this untyped XML document:
<Person Id="1234"> <Forenames>John <b>The Unknown</b> Peter</Forenames> </Person>
Here the forename element contains two text nodes, "John " and " Peter". To update the second text node, " Peter", you would need to specify the expression /Person[1]/Forenames[1]/text()[2]. There is a section later on which details some of the differences when updating typed XML documents.
Updating the values of attributes is very similar however you do not need to specify the text() function. You also do not need to specify the attribute ordinal position as duplicate attribute names are not allowed in XML.
SET @XMLData.modify(' replace value of (/Person[1]/@Id) with ("4321") ') SELECT @XMLData
Returns (truncated):
<Person Id="4321"> .... </Person>
Variables
Normally you would want to update the XML document with the contents of a variable or column and SQL provides two functions for accessing data outside of the XML document via the modify() function. They are sql:variable and sql:column. They both work in similar ways and when referenced within the XQuery, the values can be used as part of the DML statement. For example the very first example could have been written as the following by using a variable
DECLARE @Surname VARCHAR(50) SET @Surname = 'West' SET @XMLData.modify(' replace value of (/Person[1]/Surname[1]/text()[1]) with ( sql:variable("@Surname") ) ') SELECT @XMLData
Returns (truncated):
<Person Id="1234"> <Fullname /> <Surname>West</Surname> ........ </Person>
Notice that the function sql:variable() is not encased with curly braces like it needs to be when you use the insert method with functions. This is because anything after the with term is evaluated and computed.
Empty Nodes
In our untyped XML example, the node /Person/Fullname is empty and has no value so you would be forgiven for thinking that to update the node with a value, you would execute something like this:
SET @XMLData.modify(' replace value of (/Person[1]/Fullname[1]/text()[1]) with ( /Person[1]/Forenames[1], /Person[1]/Surname[1] ) ')
But if you was to run this statement, nothing would happen and no error would be raised. Why is this? Well it is because the node returned by /Person/Fullname/text() is nothing as there is no text node. The query is trying to update a node that doesn't exist and as such the query doesn't work as intended. If you supply an XPath expression that returns an empty sequence, then nothing can be updated and importantly, no error will be raised. This is something to be aware of as at face value, nothing has gone wrong until you inspect the XML.
In our untyped example, what you would need to do is use insert instead and insert a text node into /Person/Fullname. For typed XML, the text() function is not used on simple nodes so this behaviour wouldn't happen as the expression /Person/Fullname would exist and would return a node that could be updated.
Sequences
For untyped XML documents, it is possible to specify a sequence of values as the expression to update a node. When specifying a sequence of values, they are concatenated into a single string separated by a space and then the resulting value is then used to replace the existing node value. The following example retrieves the values from the <Forenames> and <Surname> as a sequence and replaces the <Fullname> node with the new value. For typed XML documents, I don't believe that you can directly use a sequence of values to update a simple type node or attributes. For string based values you could get around this by using the concat function.
A sequence can be a list of static values, variables or XPath expressions or even a mixture of any of them as this next example demonstrates appending another forename "Kevin" to the existing <Forenames> node.
SET @XMLData.modify(' replace value of (/Person[1]/Forenames[1]/text()[1]) with ( string(/Person[1]/Forenames[1]), "Kevin" ) ') SELECT @XMLData
Returns (truncated):
<Person Id="1234"> <Fullname /> <Surname>West</Surname> <Forenames>John Peter Kevin</Forenames> ....... </Person>
Conditional Statements
Another feature within the DML is the ability to use conditional statements, namely if...then...else. The syntax of if...then...else requires you to always specify all three parts and depending on the evaluation of the expression, either the then expression or the else expression will be used to update the node. For example the following will update the existing node <HasPostcode> with either "True" or "False" depending on the existence of a <PostCode> node:
SET @XMLData.modify('
replace value of
( /Person[1]/HasPostcode[1]/text()[1] )
with
(
if ( /Person[1]/Address[1]/PostCode[1] ) then
"True"
else
"False"
)
')
SELECT @XMLData
Returns (truncated):
<Person Id="1234"> ........ <HasPostcode>False</HasPostcode> </Person>
If you do not have an expression for the else part of the statement i.e. you only have an if...then and you leave the else expression empty by specifying parentheses () then you will see an error message similar to the following:
XQuery: Replacing the value of a node with an empty sequence is allowed only if '()' is used as the new value expression. The new value expression evaluated to an empty sequence but it is not '()'.
To solve this scenario, I recommend that you use a predicate and write out the if...then...else clause, but if you do want to use if...then...else then one way of achieving this is by specifying the existing value from the node that you are updating as the expression in the else expression as per the next example:
SET @XMLData.modify('
replace value of
( /Person[1]/HasPostcode[1]/text()[1] )
with
(
if ( /Person[1]/Address[1]/PostCode[1] ) then
"True"
else
( data(/Person[1]/HasPostcode[1]) )
)
')
SELECT @XMLData
Returns (truncated):
<Person Id="1234"> ........ <HasPostcode>Unknown</HasPostcode> </Person>
But as I mentioned, a predicate may be a better option if you only really want an if...then statement. For example the next query is functionally the same as the one above but it uses a predicate instead of an if...then...else statement. When the predicate is evaluated, as there are no matching nodes, no update is performed. If ever the target expression evaluates to an empty sequence then nothing will be updated and no error will be raised. You should also be aware though that if the target expression evaluates to more than one node, then an error will be raised.
SET @XMLData.modify(' replace value of ( /Person[1][ Address[1]/PostCode[1] ]/HasPostcode[1]/text()[1] ) with ( "True" ) ')
Typed XML
There are a number of differences between typed and untyped XML documents that must be considered when updating XML documents. This blog has focused on an untyped XML document example, but here are some things that would need to be changed if the XML document was strongly typed.
1. text()
For simple type nodes, the use of text() is not allowed and an error similar to the following would be raised:
'text()' is not supported on simple typed or 'http://www.w3.org/2001/XMLSchema#anyType' elements
To update the values of simple type nodes for typed XML, you just specify the XPath without the text() function. e.g.
SET @XMLData.modify(' replace value of (/Person[1]/Surname[1]) with ("Jones") ') SELECT @XMLData
2. Ordinal positions.
If the schema explicitly defines a node as being a singleton then you do not need to specify the ordinal position of the nodes in the XPath query. For example, the above example would look like:
SET @XMLData.modify(' replace value of (/Person/Surname) with ("Jones") ') SELECT @XMLData
3. You cannot use a sequence of values in the update expression like you can for untyped XML.
4. You can update the value of simple type nodes that are empty. Untyped XML documents require a text node to be inserted instead.
That concludes this part which I hope has shown how to use the replace value of keyword in a variety of ways. I haven't used predicates that much in the examples as I wanted to keep them as easy to read as possible, but predicates provide the ability for some really complex XPath expressions.
As mentioned earlier, XML DML is quite flexible and although it does have a few limitations and not implemented all of the XQuery/XPath functions, it can deal with most requirements to modify XML documents and I hope this introduction has shown that.
Enjoy!