Simplify the Creation of XML from SQL Server Data
by Hugh Lynch
CTO, Restricted Stock Systems, Inc.
Introduction
Like many developers, my team and I were enchanted by the
possibilities of XML and XSL/XSLT when they emerged a few years ago. It seemed
we could describe anything using XML and morph it to our hearts’ content using
XSL. We wove the new technology deep into our designs, building substantial
web applications in which the data is rendered as XML then transformed to HTML
and other formats.
While I am content with the decision to embrace XML, there
were significant obstacles. Chiefly, how to create XML from our relational
data in SQL Server? Existing techniques held early promise, but proved either too
simplistic (FOR XML AUTO) or too cumbersome (FOR XML EXPLICIT, XML Views, etc)
for widespread adoption.
We wanted something simple and efficient to leverage our existing
tools, experience and the capabilities of SQL Server. We invented a simple
XML-assembly grammar that we embed in the results returned from stored
procedures, utilizing SQL Server’s ability to return multiple “chained” result
sets from a single command invocation.
Next, we created a general-purpose interpreter (we
implemented COM and .NET versions) to parse the embedded XML-assembly
instructions and render the returned data accordingly as XML.
Finally, we developed two general-purpose invocation
utilities, one for the command line and one web-based (implemented in .NET and ASP.NET,
respectively). These tools provide simple mechanisms to invoke parameterized stored
procedures, process the results using the interpreter, and output the resulting
XML, either raw or transformed.
The benefits of this approach, which we call “data shapes”
or simply “shapes”, include:
a. Simplicity.
With the grammar defined and the interpreter written, developers simply write
conforming stored procedures to create arbitrarily complex XML documents.
Unlike some alternatives, complex documents do not require complex procedures –
just select the data to assemble and accompany it with simple text instructions
like “RELATE Employee.CompanyID to Company.CompanyID” (more on the grammar
later).
b. Efficiency.
Returning all of the data for a document (sometimes dozens of recordsets) from
a stored procedure keeps database trips to an absolute minimum. The technique
also delegates the assembly of the XML from the database server to the
interpreter, which can be distributed across middle- or presentation-layer
machines. Writing XML involves conversion and formatting of native data types
as text, character escaping, and filtering of child recordsets, all of which
consume valuable CPU cycles.
c. Performance
and Control. Writing shapes as stored procedures gives the developer all
the benefits of that language: pre-compiled execution plans, caching, parameterization,
conditional logic, temporary storage, permission controls, nested calls, etc.
d. Familiarity.
While the grammar of shapes is no doubt novel, the tools used to create, edit,
test and analyze them are not. Shapes can be developed in Query Analyzer and
Visual Studio or any other T-SQL tool. Their performance can be evaluated
using SQL Profiler, and dissected by examining execution plans. Because a
shape encapsulates both data selection and assembly instruction in a single
scriptable procedure, it can be easily migrated between development, test and
production environments.
This article provides a complete definition of the shape
grammar as well as functional explanations of the shape interpreter and shape
invoker utilities. The accompanying download includes the binary and
configuration files necessary to experiment with shapes on your own.
Installation instructions are listed at the end of the article.
You can also see shapes in action at http://tech.rssgroup.com/shapes.
Let’s begin with some examples.
Example 1 – Hello, World!
Tradition dictates that all new programming technologies
introduce themselves, so here is a very simple shape listing:
create procedure dbo.shpHello as select 'dataset root.Hello' _shp select 'Hello, World!' Msg go
which can be invoked through ASP.NET, using the URL:
And here are the results it produces:
<?xml version="1.0" encoding="utf-8"?> <Hello Msg="Hello, World!"/>
While trivial, there are a couple of things to note. The
shape instruction ('dataset root.Hello') tells the interpreter to name the following result set “root” and to represent each record within it as a <Hello> element. By default, the interpreter maps the single column of the data set (Msg) to an attribute of the same name in the <Hello> element.
Example 2 – A Realistic Report
Using Microsoft’s standard Northwind database, we will create a shape to return all of the orders created by a specified employee during a specified date range. The data will be organized by customer.
The following URL invokes the stored procedure dbo.shpEmployeeActivity with three parameters: employeeID, startDate and endDate.
Here’s the listing of the stored procedure. The embedded instruction statements are again highlighted in blue. The procedure uses a table variable to store orderIDs.
create procedure dbo.shpEmployeeActivity ( @employeeID int , @startDate smalldatetime , @endDate smalldatetime as set nocount on -- create a table variable to hold the set of filtered orderIDs declare @nOrders int declare @tblOrders table ( OrderID int primary key) insert @tblOrders select orderID from orders where employeeID = @employeeID and orderDate between @startDate and @endDate set @nOrders = @@rowcount set nocount off select 'dataset root.Employee' _shp select EmployeeID, LastName, FirstName, Title, ... from Employees where employeeID=@employeeID if @nOrders>0 begin select 'dataset Customers.Customer relate to root' _shp select * from customers where customerID in (select customerID from orders where orderID in (select orderID from @tblOrders)) select 'dataset Orders.Order relate customerID to Customers.customerID' _shp select * from orders where orderID in (select orderID from @tblOrders) select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp select * from [Order Details] where orderid in (select orderID from @tblOrders) end go
And here are the results (wider date ranges produce a lot more data):
<?xml version="1.0" encoding="utf-8"?> <Employee EmployeeID="6" LastName="Suyama" FirstName="Michael"...> <Notes><![CDATA[Michael is a graduate of...]]></Notes> <Customer CustomerID="GOURL" CompanyName="Gourmet Lanchonetes"...> <Order OrderID="10423" EmployeeID="6" OrderDate="1/23/1997"...> <Item ProductID="31" UnitPrice="10.0000" Quantity="14" Discount="0" /> <Item ProductID="59" UnitPrice="44.0000" Quantity="20" Discount="0" /> </Order> </Customer> <Customer CustomerID="LAMAI" CompanyName="La maison d'Asie"...> <Order OrderID="10425" EmployeeID="6" OrderDate="1/24/1997"...> <Item ProductID="55" UnitPrice="19.2000" Quantity="10" Discount="0.25" /> <Item ProductID="76" UnitPrice="14.4000" Quantity="20" Discount="0.25" /> </Order> </Customer> </Employee>
As in the prior example, records were mapped to elements and columns were mapped to attributes of the same name, except for the SQL text-typed column “Notes”, which was mapped to a CDATA block within a child element in order to preserve whitespace and minimize character escaping.
Let’s take a closer look at the embedded instructions.
select 'dataset root.Employee' _shpselect ... from employees where ...
select 'dataset Customers.Customer relate to root' _shp
select * from customers where ...
select 'dataset Orders.Order relate customerID to Customers.customerID' _shp
select * from orders where ...
select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp
select * from [Order Details] where ...
Notice how the selection of each data set (SELECT [fields] FROM )
is preceded by an instruction (SELECT
‘dataset...’ _shp). These instructions tell the shape interpreter what
is coming next and where to put it.
Here’s what the instructions mean:
select
'dataset root.Employee' _shp
Name the following result set “root” and represent each
record within it as an <Employee> element.
select
'dataset Customers.Customer relate to root' _shp
Name the following result set “Customers” and represent each
record within it a <Customer> element. Make the <Customer>
elements appear as child elements of the <Employee> elements produced by
the “root” result set. In this case, there is only one <Employee>
element because we are selecting by primary key. If there were more than one,
the <Customer> elements would be repeated as child elements of each.
select
'dataset Orders.Order relate customerID to Customers.customerID' _shp
Name the following result set
“Orders” and represent each record within it as an <Order> element. Make
the <Order> elements appear as child elements of the <Customer>
elements produced by the “Customers” result set. The <Order> elements
will be filtered such that only those orders with matching customerID fields
appear as children of each <Customer> element. When rendering to XML,
the shape interpreter will not render the customerID attribute of the
<Order> element, since it is implicit in the XML structure.
select
'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp
Name the following result set “OrderDetails” and represent each
record within it as an <Item> element. Make the <Item> elements
appear as child elements of the <Order> elements produced by the “Orders”
result set, grouped by orderID.
Note that any Order records that do not match a selected
Customer record will not appear in the XML. Nor will any OrderDetails records
appear that do not match a selected Order. In other words, if the where
clauses (in orange) were omitted, the XML produced would be identical, but the
amount of data loaded by SQL Server and passed to the interpreter would
increase dramatically.
Because this error is not detectable in the output, except
through degraded performance, it can be difficult to diagnose. It is always a
good idea to test your shapes in Query Analyzer to verify that the data sets
are appropriately filtered. It is also possible that a future version of the
interpreter could detect this condition and produce a warning.
While this shape is intended to be representative of actual,
useful shapes, it does not fully exercise the shape grammar, which provides
fine controls over the mapping and formatting of records and columns. Simple
options are available to:
a. wrap
record elements in a grouping element. For example, <Customer> elements
could be wrapped by a <Customers> element.
b. render
all columns as child elements
c. render
an individual column as an attribute, child element, free text, CDATA block or
nested XML
d. format
numeric and date columns using Excel-like formatting strings
e. omit
a column from the XML output
To get a better sense of the possibilities, let’s examine
the full grammar.
Shape Grammar
Syntax
DATASET datasetName.elementName[
WITH options][ RELATE clause[ RELATE clause...]]
RELATE clause:
RELATE [TO parentDataset
| childField1[,childField2...]
TO parentDataset.parentField1[,parentField2]]
[ WITH options]
options:
option[
option...]
option:
setOption[,setOption...]|[fieldName:fieldOption[,fieldOption]
setOption:
elements|attributes|showkeys|group[(groupName)]|xmlns(prefix=URI)
fieldOption:
fieldRepresentation|format("formatString")|omit
fieldRepresentation:
attrib|element|text|cdata|raw[(indent)]
Arguments
datasetName
Specifies
a name for the subsequent result set that must be unique within all result sets
returned by the stored procedure. If a grouping option is specified, this name
is the default element name for the grouping element.
elementName
Specifies
a name for the elements generated for each record in the subsequent result set.
RELATE TO parentDataset
Indicates
that the elements generated for each record in the subsequent result set should
be nested beneath each element generated from the result set name parentDataset.
If more than one element is produced from the parent result set, all child
elements will be repeated beneath each parent element.
RELATE childField1[,childField2...]
TO parentDataset.parentField1[,parentField2]]
Indicates
that the elements generated for each record in the subsequent result set should
be nested beneath the elements generated from the result set name parentDataset,
and filtered such that only those child records where the values of the
specified child fields match the values of the specified parent fields will
produce child elements beneath a given parent element. If a child record does
not match any parent record, it will not produce an XML element.
WITH options
Specifies
options that affect the appearance of the elements generated for each record in
the subsequent result set. Options set on the dataset are overridden by
options set in a RELATE clause. While it is unusual to relate a result set to
multiple parents, it is possible, and in such cases each RELATE clause may have
separate and distinct options.
setOption:
attribs
Indicates
that fields should produce XML attributes (this is the default)
elements
Indicates
that fields should produce XML elements, rather than XML attributes by default
showkeys
Indicates
that child fields related to parent fields should not be omitted from the child
element as they are by default.
group[(groupName)]
Indicates
that child elements should be wrapped within a grouping element and optionally
specifies the name of that grouping element. The default group element name is
the datasetName.
xmlns(namespace)
Specifies
and XML namespace to appear in each child element.
fieldOption:
fieldRepresentation
Specifies
the representation of a field within the resulting XML. The default
representation for all fields other than text fields is as an XML attribute.
The default representation for text fields is as a CDATA block within a child
element.
attrib
represent
the field as an XML attribute of the record element:
<elementName ... fieldName=”value” …/>
value will be appropriately escaped for attribute representation
element
represent
the field as a child element of the record element:
<elementName ...>
…
<fieldName>value</fieldName>
…
</elementName>
value will be appropriately escaped for text representation