In the last two Levels of the Stairway to XML series, you learned how to use the XML
data type when defining columns, variables, and parameters. You also learned the difference between typed and untyped XML
objects and how you can associate typed objects with an XML schema collection. As you have seen, the XML
data type makes it relatively easy to store XML documents and fragments.
However, in those Levels, we merely scratched the surface when it comes to understanding the full power of the XML
data type. It turns out that the type supports five methods—query(
)
, value()
, exist
()
, nodes
()
, and modify
()
—that let you query and manipulate the elements and attributes within the instances stored in the XML
objects. In this Level, we’ll cover the two most common methods used to query XML data: query(
)
and value()
. In the Levels that follow, we’ll cover the others.
Each XML
method requires, at a minimum, one argument that is an XQuery expression. XQuery is a powerful scripting language used to access XML data. The language contains the functions, operators, variables, values, and other elements necessary to create complex expressions. SQL Server supports a subset of the XQuery language that you use to create the expressions you pass into the XML
methods. With these expressions, you can identify very specifically the components in the XML instances you want to retrieve or modify.
Note
Because XQuery is such a complex language, we can touch upon only some of its components in this Level. For a more thorough understanding of XQuery and how it’s implemented in SQL Server, see the MSDN XQuery language reference.
Now let’s get started with the query(
)
and value()
methods. In this Level, we look at a number of examples that use them to access XML data. The examples are based on a database and table I created on a local instance of SQL Server 2008 R2. The Transact-SQL in Listing 1 creates the test environment necessary to run these examples. The environment includes the ClientDB
database, the ClientInfoCollection
XML schema collection, and the ClientInfo
table.
USE master; GO IF DB_ID('ClientDB') IS NOT NULL DROP DATABASE ClientDB; GO CREATE DATABASE ClientDB; GO USE ClientDB; GO IF OBJECT_ID('ClientInfoCollection') IS NOT NULL DROP XML SCHEMA COLLECTION ClientInfoCollection; GO CREATE XML SCHEMA COLLECTION ClientInfoCollection AS '<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="urn:ClientInfoNamespace" targetNamespace="urn:ClientInfoNamespace" elementFormDefault="qualified"> <xsd:element name="People"> <xsd:complexType> <xsd:sequence> <xsd:element name="Person" minOccurs="1" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="FirstName" type="xsd:string" minOccurs="1" maxOccurs="1" /> <xsd:element name="LastName" type="xsd:string" minOccurs="1" maxOccurs="1" /> <xsd:element name="FavoriteBook" type="xsd:string" minOccurs="0" maxOccurs="5" /> </xsd:sequence> <xsd:attribute name="id" type="xsd:integer" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema>'; GO IF OBJECT_ID('ClientInfo') IS NOT NULL DROP TABLE ClientInfo; GO CREATE TABLE ClientInfo ( ClientID INT PRIMARY KEY IDENTITY, Info_untyped XML, Info_typed XML(ClientInfoCollection) ); INSERT INTO ClientInfo (Info_untyped, Info_typed) VALUES ( '<?xml version="1.0" encoding="UTF-8"?> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>', '<?xml version="1.0" encoding="UTF-8"?> <People xmlns="urn:ClientInfoNamespace"> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>' );
As Listing 1 shows, the ClientInfo
table includes a typed XML
column (Info_typed
) and an untyped XML
column (Info_untyped
). The typed column is associated with the ClientInfoCollection
XML schema collection, which includes only one XSD schema. The schema’s namespace is urn
:ClientInfoNamespace
. For details about working with XML schema collections and typed database objects, refer back to Level 2.
The XML query() Method
Perhaps the simplest of the XML
methods to understand and use is the query(
)
method. The method is most often used to return an instance of untyped XML that is a subset of the targeted XML data. To use the method, you specify the XML
database object, the method itself, and the XQuery expression enclosed in parentheses and single quotes, as shown in the following syntax:
db
_object
.query('
xquery_exp
')
When calling the query(
)
method, you replace the db_object
placeholder with the name of the actual database object and replace the xquery_exp
placeholder with the XQuery expression.
Let’s look at an example that demonstrates how this work. In the SELECT
statement shown in Listing 2, I use the query(
)
method to retrieve data from the Info_untyped
column. I first specify the column name, a period, and the method name. Then, enclosed in parentheses and single quotes, I add the XQuery expression (/People
).
SELECT Info_untyped.query('/People') AS People_untyped FROM ClientInfo;
In this case, the XQuery expression is as about simple as it can get. I am essentially telling the query(
)
method to return the <People>
element and all its contents (the child elements, attributes, and their values). To do so, as you can see, I needed only to specify the word People
, preceded by a forward slash. Listing 3 shows the results returned by the SELECT
statement.
<People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
If I want to instead retrieve the <People>
element from the typed column, I need to modify XQuery expression to include a reference to the namespace specified in the XSD schema defined in the XML schema collection. Listing 4 shows how I modified the statement to retrieve data from the typed XML
column.
SELECT Info_typed.query( 'declare namespace ns="urn:ClientInfoNamespace"; /ns:People') AS People_typed FROM ClientInfo;
Notice that the query expression is broken into two sections, separated with a semicolon. The first section declares the namespace. It begins with the declare namespace
keywords, followed by a namespace alias, in this case ns
. The alias is followed with an equal sign and then the namespace itself (urn
:ClientInfoNamespace
), enclosed in double quotes.
The second section of the XQuery expression is similar to the expression used for an untyped column, except that the element name is first proceeded by the namespace alias and a colon (ns:
). For typed columns, your element names must be fully qualified, that is, they must reflect the schema they’re associated with, and that is done by specifying the associated namespace. However, instead of typing the entire namespace for each element, you can instead use an alias, which is what I’ve done.
When you run the SELECT
statement, it returns the results shown in Listing 5.
<People xmlns="urn:ClientInfoNamespace"> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
As you can see, the results returned from the typed column are nearly identical to those returned by the untyped column, except for now the results include a reference to the namespace.
You may have noticed that the last two examples return what is essentially the entire XML instance saved to the each column. Although the examples are useful for demonstrating how the query(
)
method works, using the method to return the entire instance is not very useful because you can do that without using the method at all. The key, of course, is to refine the XQuery expression to return more specific data.
Suppose for example, instead of retrieving the entire XML document, we instead want to return each instance of the <Person>
element, along with its child elements and attributes. For the untyped column, we would need to modify our XQuery expression by adding /Person
to our path name so that our expression reads /
People/
Person
, as shown in Listing 6.
SELECT Info_untyped.query( '/People/Person') AS People_untyped, Info_typed.query( 'declare namespace ns="urn:ClientInfoNamespace"; /ns:People/ns:Person') AS People_typed FROM ClientInfo;
What I’ve done here is to further qualify the path name within the expression by adding the second element. Now only the <Person>
elements are returned and not the <Pe
ople
>
element. Listing 7 shows the results returned for the untyped column. Notice that only the two instances of the <Person>
element have been returned.
<Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person>
If you refer back to Listing 6, you’ll see that for the typed column, the expression includes the ns:
alias prefix before each element within the path name. Listing 8 shows the results returned for that column. This time, the namespace reference is included with each instance of the <Person>
element.
<ns:Person xmlns:ns="urn:ClientInfoNamespace" id="1234"> <ns:FirstName>John</ns:FirstName> <ns:LastName>Doe</ns:LastName> </ns:Person> <ns:Person xmlns:ns="urn:ClientInfoNamespace" id="5678"> <ns:FirstName>Jane</ns:FirstName> <ns:LastName>Doe</ns:LastName> </ns:Person>
Now suppose we want to return a specific instance of the <Person>
element. One way we can do this is to further qualify the XQuery expression by adding a reference to the id
attribute and a specific attribute value. In the example shown in Listing 9, I’ve added a reference to the id
attribute for both the typed and untyped columns.
SELECT Info_untyped.query( '/People/Person[@id=1234]') AS People_untyped, Info_typed.query( 'declare namespace ns="urn:ClientInfoNamespace"; /ns:People/ns:Person[@id=5678]') AS People_typed FROM ClientInfo;
The first thing to notice is that I’ve enclosed the attribute reference in brackets. In addition, I preceded the attribute name with an at (@
) sign and followed it with an equal and then provided the attribute value. For the untyped column, I used the value 1234
. As a result, the XML returned from that column includes only the <Person>
element whose id
value equals 1234
, as shown in Listing 10.
<Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person>
For the typed column, I used the value 5678
in the attribute reference. Notice, however, that I do not need to include the namespace alias prefix along with the attribute name. Referencing the namespace in the element name is enough.
Note
When specifying a specific value in your XQuery expression, as I do for the id
attribute value, string values should be enclosed in double quotes. However, the rules for numeric values are somewhat different. For untyped columns, you can also specify numeric values in double quotes or without the quotes, but when working with typed columns, you must conform to the schema, which in this case specifies the id
attribute as an INT
value. Consequently, you cannot enclose the value in quotes. If you do, you’ll receive an error when running your statement.
Not surprisingly, the XML returned from the typed column includes only the <Person>
element whose id
value equals 5678
, as shown in Listing 11.
<ns:Person xmlns:ns="urn:ClientInfoNamespace" id="5678"> <ns:FirstName>Jane</ns:FirstName> <ns:LastName>Doe</ns:LastName> </ns:Person>
We can even further refine our XQuery expression by adding another child element to the path name, in this case the <FirstName>
element. For the untyped column, we simply add /FirstName
to the expression, and for the typed column, we add /ns
:FirstName
. Listing 12 shows what our SELECT
statement now looks like.
SELECT Info_untyped.query( '/People/Person[@id=1234]/FirstName') AS People_untyped, Info_typed.query( 'declare namespace ns="urn:ClientInfoNamespace"; /ns:People/ns:Person[@id=5678]/ns:FirstName') AS People_typed FROM ClientInfo;
The XML returned from the untyped column now includes only the <FirstName>
child element of the <Person>
element whose id
value is 1234
, as shown in Listing 13.
<FirstName>John</FirstName>
The XML returned from the typed column now includes only the <FirstName>
child element of the <Person>
element whose id
value is 5678
, as shown in Listing 14. Notice that, even at this level, the namespace is included in the returned value.
<ns:FirstName xmlns:ns="urn:ClientInfoNamespace">Jane</ns:FirstName>
Another way you can reference a specific element within an XQuery expression is to specify the element’s position number, relative to other instances of that element. For example, our source data includes two instances of the <Person>
element. The first instance of that element is implicitly assigned the number 1 and the second instance the number 2. In the SELECT
statement in Listing 15, I use the numbers 1 and 2 to reference those instances.
SELECT Info_untyped.query( '/People/Person[1]/FirstName') AS People_untyped, Info_typed.query( 'declare namespace ns="urn:ClientInfoNamespace"; /ns:People/ns:Person[2]/ns:FirstName') AS People_typed FROM ClientInfo;
Notice that instead of specifying an attribute reference, I specify the value [
1
]
for the untyped column and the value [
2
]
for the typed column. That means, in the case of the untyped column, the [1]
indicates that the first instance of the <Person>
element should be returned, or rather, the <FirstName>
element of that instance, and for the typed column, the [2
]
indicates that the second instance of the <Person>
element should be returned. Although I’ve used the numerical references, the SELECT
statement in Listing 15 returns the same results as the statement in Listing 12.
The XML value() Method
As handy as the query(
)
method can be, there might be times that you want to retrieve a specific element or attribute value, rather than returning an XML element. That’s where the value(
)
method comes in. The method not only retrieves a specific value, but does so as a specified data type. For this reason, when you call the value(
)
method, you must pass in two arguments—the XQuery expression and the Transact-SQL data type—as shown in the following syntax:
db
_object
.value('
xquery_exp
', '
sql_type
')
Notice that you call the value(
)
method in much the same way you call the query()
method. The only difference is the second argument, in which you specify the data type. For example, the SELECT
statement shown in Listing 16 retrieves the <FirstName>
value from the XML and returns it with the VARCHAR
data type.
SELECT Info_untyped.value( '(/People/Person[1]/FirstName)[1]', 'varchar(20)') AS Name_untyped, Info_typed.value( 'declare namespace ns="urn:ClientInfoNamespace"; (/ns:People/ns:Person[2]/ns:FirstName)[1]', 'varchar(20)') AS Name_typed FROM ClientInfo;
As the listing shows, you first specify the XQuery expression, followed by a comma, and then the data type. Like the XQuery expression, the data type must be enclosed in single quotes. That part should be fairly straightforward. What is not so straightforward is the XQuery expression itself. Although for both the typed and untyped columns the expressions are much the same as their counterparts in Listing 15, there is a significant difference. Each expression is enclosed in parentheses and following by [1]
. The parentheses ensure that the expression is treated as a single unit to which the [1]
can be applied.
The [1]
means that the first instance of the returned instances is the instance that the expression should use. For example, suppose your XQuery expression returns multiple <Person>
elements. Surrounding the expression with parentheses and adding the [1]
indicates that the first instance of <Person>
should be used. Note, however, that even if your expression returns only one instance, you must still include the [1]
because a singleton value is required by the value(
)
method, and the [1]
ensures that only one value can be returned.
In Listing 16, I specify [1]
after the XQuery expression for both the untyped and typed columns. And because I’ve used the value(
)
method in both cases, the SELECT
statement returns only the first names of the two people listed in the XML documents, as shown in Listing 17.
Name_untyped Name_typed -------------------- -------------------- John Jane
In some cases, you can eliminate the internal numerical identifier after a specific element and use only the outer one to identify the XML element. If you do this, however, you must make sure your outer reference identifies the correct instance. For example, in Listing 18, I removed the numerical references associated with the <Person>
element and then modified the expression for the typed column by changing the final [1]
to [2]
.
SELECT Info_untyped.value( '(/People/Person/FirstName)[1]', 'varchar(20)') AS Name_untyped, Info_typed.value( 'declare namespace ns="urn:ClientInfoNamespace"; (/ns:People/ns:Person/ns:FirstName)[2]', 'varchar(20)') AS Name_typed FROM ClientInfo;
Because there are two instances of the <FirstName>
element, you can use the final numeric qualifier to distinguish which instance you want to return. The statement returns the same results as the statement in Listing 16.
You can also use the same strategy to return an attribute value. For example, the SELECT
statement in Listing 19 retrieves the values of each instance of the id
attribute and assigns the INT
data type to the returned values.
SELECT Info_untyped.value( '(/People/Person/@id)[1]', 'int') AS Name_untyped, Info_typed.value( 'declare namespace ns="urn:ClientInfoNamespace"; (/ns:People/ns:Person/@id)[2]', 'int') AS Name_typed FROM ClientInfo;
As you would expect, the SELECT
statement returns only the attribute values, as shown in Listing 20.
Name_untyped Name_typed -------------------- -------------------- 1234 5678
In addition to defining path names in your XQuery expressions, you can incorporate XQuery functions that let you further refine your query and manipulate data. For instance, XQuery supports the count(
)
function, which provides a count of the number of instances returned by an expression. In Listing 21, I use the count(
)
function to return the number of <Person>
elements in the XML document in each XML
column.
SELECT Info_untyped.value( 'count(/People/Person)', 'int') AS Number_untyped, Info_typed.value( 'declare namespace ns="urn:ClientInfoNamespace"; count(/ns:People/ns:Person)', 'int') AS Number_typed FROM ClientInfo;
For each XQuery expression, I specify the count(
)
function, followed by the path name, which is enclosed in parenthesis. Because the count(
)
function itself returns a singleton value, I do not have to tag the [1]
onto the data path, even though I’m using the value()
method. Listing 22 shows the results returned for each XML
column. As you would expect, the value 2
is returned in both cases.
Number_untyped Number_typed -------------- ------------ 2 2
Another example of an XQuery function is concat(
)
, which lets you concatenate two or more values from an XML document. To use the function, you specify each segment that you want to concatenate as an argument to the function, as demonstrated in Listing 23.
SELECT Info_untyped.value( 'concat((/People/Person/FirstName)[2], " ", (/People/Person/LastName)[2])', 'varchar(25)') AS FullName FROM ClientInfo;
In this case, I’m passing three arguments into the concat
(
)
function, which I enclose in parentheses and separate with commas. The first and third arguments are basic XQuery expressions that are themselves qualified with parentheses and a numerical tag to indicate which element instance to return, exactly the sort of expression you would expect to pass to the value(
)
method. The second argument is merely a blank space, enclosed in double quotes. The space will be inserted between the first and last names. Listing 24 shows the results that the statement returns.
FullName ------------------------- Jane Doe
The first and last names have been concatenated into a single value. Both names come from the second instance of the <Person>
element.
Conclusion
As you’ve seen in this Level, you can use the query(
)
method to retrieve a subset of data from an XML instance, and you can use the value()
method to retrieve individual element and attribute values from an XML instance. In the next Level, we’ll cover the exist(
)
and nodes()
methods. Although these methods are also used to query XML data, the results they return are not simple XML instances or values. In fact, the methods are often used in conjunction with the query(
)
and value()
methods because of the type of data they return.
In the Level that follows the next one, we’ll review the modify(
)
method, the only XML
method that lets you manipulate XML data. But keep in mind that, as stated earlier, XQuery expressions can get far more complicated than what I’ve demonstrated so far or will be demonstrating, so I recommend you review the XQuery Language Reference if you plan to write many XQuery expressions. Also note that we’ll be using the same test environment in the next Level, so you might want to keep that around, if you’re so inclined.