Simplify the Creation of XML from SQL Server Data

by Hugh Lynch

CTO, Restricted Stock Systems, Inc.

Introduction

Like many developers, my team and I were enchanted by the

possibilities of XML and XSL/XSLT when they emerged a few years ago. It seemed

we could describe anything using XML and morph it to our hearts’ content using

XSL. We wove the new technology deep into our designs, building substantial

web applications in which the data is rendered as XML then transformed to HTML

and other formats.

While I am content with the decision to embrace XML, there

were significant obstacles. Chiefly, how to create XML from our relational

data in SQL Server? Existing techniques held early promise, but proved either too

simplistic (FOR XML AUTO) or too cumbersome (FOR XML EXPLICIT, XML Views, etc)

for widespread adoption.

We wanted something simple and efficient to leverage our existing

tools, experience and the capabilities of SQL Server. We invented a simple

XML-assembly grammar that we embed in the results returned from stored

procedures, utilizing SQL Server’s ability to return multiple “chained” result

sets from a single command invocation.

Next, we created a general-purpose interpreter (we

implemented COM and .NET versions) to parse the embedded XML-assembly

instructions and render the returned data accordingly as XML.

Finally, we developed two general-purpose invocation

utilities, one for the command line and one web-based (implemented in .NET and ASP.NET,

respectively). These tools provide simple mechanisms to invoke parameterized stored

procedures, process the results using the interpreter, and output the resulting

XML, either raw or transformed.

The benefits of this approach, which we call “data shapes”

or simply “shapes”, include:

a. Simplicity.

With the grammar defined and the interpreter written, developers simply write

conforming stored procedures to create arbitrarily complex XML documents.

Unlike some alternatives, complex documents do not require complex procedures –

just select the data to assemble and accompany it with simple text instructions

like “RELATE Employee.CompanyID to Company.CompanyID” (more on the grammar

later).

b. Efficiency.

Returning all of the data for a document (sometimes dozens of recordsets) from

a stored procedure keeps database trips to an absolute minimum. The technique

also delegates the assembly of the XML from the database server to the

interpreter, which can be distributed across middle- or presentation-layer

machines. Writing XML involves conversion and formatting of native data types

as text, character escaping, and filtering of child recordsets, all of which

consume valuable CPU cycles.

c. Performance

and Control. Writing shapes as stored procedures gives the developer all

the benefits of that language: pre-compiled execution plans, caching, parameterization,

conditional logic, temporary storage, permission controls, nested calls, etc.

d. Familiarity.

While the grammar of shapes is no doubt novel, the tools used to create, edit,

test and analyze them are not. Shapes can be developed in Query Analyzer and

Visual Studio or any other T-SQL tool. Their performance can be evaluated

using SQL Profiler, and dissected by examining execution plans. Because a

shape encapsulates both data selection and assembly instruction in a single

scriptable procedure, it can be easily migrated between development, test and

production environments.

This article provides a complete definition of the shape

grammar as well as functional explanations of the shape interpreter and shape

invoker utilities. The accompanying download includes the binary and

configuration files necessary to experiment with shapes on your own.

Installation instructions are listed at the end of the article.

You can also see shapes in action at http://tech.rssgroup.com/shapes.

Let’s begin with some examples.

Example 1 – Hello, World!

Tradition dictates that all new programming technologies

introduce themselves, so here is a very simple shape listing:

create procedure dbo.shpHello
as
   select 'dataset root.Hello' _shp
   select 'Hello, World!' Msg
go

which can be invoked through ASP.NET, using the URL:

http://.../shpHello.aspx

And here are the results it produces:

<?xml version="1.0" encoding="utf-8"?>
<Hello Msg="Hello, World!"/>

While trivial, there are a couple of things to note. The

shape instruction ('dataset root.Hello') tells the interpreter to name the following result set “root” and to represent each record within it as a <Hello> element. By default, the interpreter maps the single column of the data set (Msg) to an attribute of the same name in the <Hello> element.

Example 2 – A Realistic Report

Using Microsoft’s standard Northwind database, we will create a shape to return all of the orders created by a specified employee during a specified date range. The data will be organized by customer.

The following URL invokes the stored procedure dbo.shpEmployeeActivity with three parameters: employeeID, startDate and endDate.

http://.../shpEmployeeActivity.aspx?ID_Employee=6&startDate=1/1/97&endDate=1/31/97

Here’s the listing of the stored procedure. The embedded instruction statements are again highlighted in blue. The procedure uses a table variable to store orderIDs.

create procedure dbo.shpEmployeeActivity (
  @employeeID  int
, @startDate   smalldatetime
, @endDate     smalldatetime
as
   set nocount on
   -- create a table variable to hold the set of filtered orderIDs
   declare @nOrders int
   declare @tblOrders table (
      OrderID int primary key)
   insert @tblOrders
   select orderID
    from orders 
    where employeeID = @employeeID
    and orderDate between @startDate and @endDate
   set @nOrders = @@rowcount
   set nocount off
   select 'dataset root.Employee' _shp
   select EmployeeID, LastName, FirstName, Title, ...
    from Employees
    where employeeID=@employeeID
   if @nOrders>0
    begin
      select 'dataset Customers.Customer relate to root' _shp
      select * 
       from customers
       where customerID in 
         (select customerID from orders where orderID in 
            (select orderID from @tblOrders))
      select 'dataset Orders.Order relate customerID to Customers.customerID' _shp
      select *
       from orders
       where orderID in (select orderID from @tblOrders)

      select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp
      select *
       from [Order Details]
       where orderid in (select orderID from @tblOrders)
   end
go

And here are the results (wider date ranges produce a lot more data):

<?xml version="1.0" encoding="utf-8"?>
<Employee EmployeeID="6" LastName="Suyama" FirstName="Michael"...>
   <Notes><![CDATA[Michael is a graduate of...]]></Notes>
   <Customer CustomerID="GOURL" CompanyName="Gourmet Lanchonetes"...>
      <Order OrderID="10423" EmployeeID="6" OrderDate="1/23/1997"...>
         <Item ProductID="31" UnitPrice="10.0000" Quantity="14" Discount="0" />
         <Item ProductID="59" UnitPrice="44.0000" Quantity="20" Discount="0" />
      </Order>
   </Customer>
   <Customer CustomerID="LAMAI" CompanyName="La maison d'Asie"...>
      <Order OrderID="10425" EmployeeID="6" OrderDate="1/24/1997"...>
         <Item ProductID="55" UnitPrice="19.2000" Quantity="10" Discount="0.25" />
         <Item ProductID="76" UnitPrice="14.4000" Quantity="20" Discount="0.25" />
      </Order>
   </Customer>
</Employee>

As in the prior example, records were mapped to elements and columns were mapped to attributes of the same name, except for the SQL text-typed column “Notes”, which was mapped to a CDATA block within a child element in order to preserve whitespace and minimize character escaping.

Let’s take a closer look at the embedded instructions.

select 'dataset root.Employee' _shp

select ... from employees where ...

select 'dataset Customers.Customer relate to root' _shp

select * from customers where ...

select 'dataset Orders.Order relate customerID to Customers.customerID' _shp

select * from orders where ...

select 'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp

select * from [Order Details] where ...

Notice how the selection of each data set (SELECT [fields] FROM

)

is preceded by an instruction (SELECT

‘dataset...’ _shp). These instructions tell the shape interpreter what

is coming next and where to put it.

Here’s what the instructions mean:

select

'dataset root.Employee' _shp

Name the following result set “root” and represent each

record within it as an <Employee> element.

select

'dataset Customers.Customer relate to root' _shp

Name the following result set “Customers” and represent each

record within it a <Customer> element. Make the <Customer>

elements appear as child elements of the <Employee> elements produced by

the “root” result set. In this case, there is only one <Employee>

element because we are selecting by primary key. If there were more than one,

the <Customer> elements would be repeated as child elements of each.

select

'dataset Orders.Order relate customerID to Customers.customerID' _shp

Name the following result set

“Orders” and represent each record within it as an <Order> element. Make

the <Order> elements appear as child elements of the <Customer>

elements produced by the “Customers” result set. The <Order> elements

will be filtered such that only those orders with matching customerID fields

appear as children of each <Customer> element. When rendering to XML,

the shape interpreter will not render the customerID attribute of the

<Order> element, since it is implicit in the XML structure.

select

'dataset OrderDetails.Item relate orderID to Orders.orderID' _shp

Name the following result set “OrderDetails” and represent each

record within it as an <Item> element. Make the <Item> elements

appear as child elements of the <Order> elements produced by the “Orders”

result set, grouped by orderID.

Note that any Order records that do not match a selected

Customer record will not appear in the XML. Nor will any OrderDetails records

appear that do not match a selected Order. In other words, if the where

clauses (in orange) were omitted, the XML produced would be identical, but the

amount of data loaded by SQL Server and passed to the interpreter would

increase dramatically.

Because this error is not detectable in the output, except

through degraded performance, it can be difficult to diagnose. It is always a

good idea to test your shapes in Query Analyzer to verify that the data sets

are appropriately filtered. It is also possible that a future version of the

interpreter could detect this condition and produce a warning.

While this shape is intended to be representative of actual,

useful shapes, it does not fully exercise the shape grammar, which provides

fine controls over the mapping and formatting of records and columns. Simple

options are available to:

a. wrap

record elements in a grouping element. For example, <Customer> elements

could be wrapped by a <Customers> element.

b. render

all columns as child elements

c. render

an individual column as an attribute, child element, free text, CDATA block or

nested XML

d. format

numeric and date columns using Excel-like formatting strings

e. omit

a column from the XML output

To get a better sense of the possibilities, let’s examine

the full grammar.

Shape Grammar

Syntax

DATASET datasetName.elementName[

WITH options][ RELATE clause[ RELATE clause...]]

RELATE clause:

RELATE [TO parentDataset

| childField1[,childField2...]

TO parentDataset.parentField1[,parentField2]]

[ WITH options]

options:

option[

option...]

option:

setOption[,setOption...]|[fieldName:fieldOption[,fieldOption]

setOption:

elements|attributes|showkeys|group[(groupName)]|xmlns(prefix=URI)

fieldOption:

fieldRepresentation|format("formatString")|omit

fieldRepresentation:

attrib|element|text|cdata|raw[(indent)]

Arguments

datasetName

Specifies

a name for the subsequent result set that must be unique within all result sets

returned by the stored procedure. If a grouping option is specified, this name

is the default element name for the grouping element.

elementName

Specifies

a name for the elements generated for each record in the subsequent result set.

RELATE TO parentDataset

Indicates

that the elements generated for each record in the subsequent result set should

be nested beneath each element generated from the result set name parentDataset.

If more than one element is produced from the parent result set, all child

elements will be repeated beneath each parent element.

RELATE childField1[,childField2...]

TO parentDataset.parentField1[,parentField2]]

Indicates

that the elements generated for each record in the subsequent result set should

be nested beneath the elements generated from the result set name parentDataset,

and filtered such that only those child records where the values of the

specified child fields match the values of the specified parent fields will

produce child elements beneath a given parent element. If a child record does

not match any parent record, it will not produce an XML element.

WITH options

Specifies

options that affect the appearance of the elements generated for each record in

the subsequent result set. Options set on the dataset are overridden by

options set in a RELATE clause. While it is unusual to relate a result set to

multiple parents, it is possible, and in such cases each RELATE clause may have

separate and distinct options.

setOption:

attribs

Indicates

that fields should produce XML attributes (this is the default)

elements

Indicates

that fields should produce XML elements, rather than XML attributes by default

showkeys

Indicates

that child fields related to parent fields should not be omitted from the child

element as they are by default.

group[(groupName)]

Indicates

that child elements should be wrapped within a grouping element and optionally

specifies the name of that grouping element. The default group element name is

the datasetName.

xmlns(namespace)

Specifies

and XML namespace to appear in each child element.

fieldOption:

fieldRepresentation

Specifies

the representation of a field within the resulting XML. The default

representation for all fields other than text fields is as an XML attribute.

The default representation for text fields is as a CDATA block within a child

element.

attrib

represent

the field as an XML attribute of the record element:

value will be appropriately escaped for attribute representation

element

represent

the field as a child element of the record element:

…

<fieldName>value</fieldName>

…

</elementName>

value will be appropriately escaped for text representation

The Shape Interpreter

grammar defined above, and renders the data accordingly as XML. The implementation

suitable output (filename, IO.Stream, TextWriter or XmlWriter) to its WriteShapeXml()

The Shape Invokers

The first thing to note is that there is actually no .aspx file necessary to process these requests. By adding an entry to the httpHandlers section of the web.config file, all .aspx requests are routed to the ShapeHandler class of the RSSWebShapes assembly, which knows how to interpret the URL.

If shapes need to co-exist with standard .aspx pages, it is a simple matter to register a different extension for shapes and substitute that extension for .aspx in shape URLs. Typically, we use .shp. Changing the extension does require custom configuration of the virtual directory in IIS.

The ShapeHandler class creates a SqlCommand object for executing a stored procedure named shapeName. Using a database connection defined in the web.config, the ShapeHandler retrieves the parameter list from the database and populates any command parameters that were provided in the URL.

When passing parameters, note that parameter values should be URL escaped and that parameter names should omit the @ prefix.

Simplify the Creation of XML from SQL Server Data

Introduction

Example 1 – Hello, World!

Example 2 – A Realistic Report

Shape Grammar

The Shape Interpreter

The Shape Invokers

Shape Testing

Conclusion

Installation

Rate

Share

Categories

Share

Rate

Simplify the Creation of XML from SQL Server Data

Introduction

Example 1 – Hello, World!

Example 2 – A Realistic Report

Shape Grammar

The Shape Interpreter

The Shape Invokers

Shape Testing

Conclusion

Installation

Rate

Share

Categories

Share

Rate

Related content

XML Options in Microsoft SQL Server 2005

Reduce Database RoundTrips Using XML

An Alternative XML Solution

An Introduction to InfoPath 2003

For SQL Server, XML Is One Answer