There are many articles on XML and how to work with XML structures. SQLServerCentral.com is one of many great sources with an abundance of articles. In this article I would like to show how XML can be used in real life scenarios and how XML can be validated by the process against XML schema registered in SQL Server.
My company collects a lot of statistical data from various
applications. The data elements are often either changed or added, and sometimes
both. When an adjustment is required, it's required to be modified in many
stored procedures because usually the number of parameters is changed.
Recently we decided to create a generic solution that will require applications to pass only few standard parameters: stats data in XML, XML schema name, and application code. The third parameter is irrelevant for this article and will be omitted in further discussion.
Below I have drawn out a general architectural idea. As you can see, the application(s) output data into a general staging XML table. It can be one table per application or one table per server, or table(s) for any other grouping. The decision should be made based on the number of records, number of applications, and practicality. The number of records should be the last concern because application(s) will only insert statistical data and never consume it.
Let start with a simple example of an XML schema and pretend that the statistical
data consists of three fields: companyid integer, prodno varchar, and userid
integer. To be able to validate the XML value against the registered XML schema, the custom schema has to be created and registered in SQL Server. Remember that the schema registration is per database.
To avoid the schema registration in every database, I created a special staging database per server to hold table(s) with staging statistical data. The same database can be used to hold tables with extracted "raw" statistical data. But I would recommend create another independent database. I will not explain in detail the XML schema because it is not a purpose of the article. Instead I will touch on the necessary parts of the XML syntax's.
drop
XML SCHEMA COLLECTION
stat
go
create
XML SCHEMA COLLECTION
stat AS
'<?xml
version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://ns" xmlns="http://ns">
<xs:element name="statroot">
<xs:complexType>
<xs:sequence>
<xs:element name="statrecord" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element
name="companyid" type="xs:integer" />
<xs:element name="prodno" type="xs:string" />
<xs:element name="userid" type="xs:integer" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema> '
To verify the proper schema registration we can declare the XML variable
type of schema "stat" and assign data.
declare
@xmlvar1 xml (stat);
set
@xmlvar1 = '
<xs:statroot xmlns:xs="http://ns">
<statrecord>
<companyid>6</companyid>
<prodno>prd1234</prodno>
<userid>12345</userid>
</statrecord>
</xs:statroot>
'
As I said before, different applications and different parts of the same
application usually output different statistical data. This means that we will
have data packed into multiple XML schemas that have to be inserted into one
staging table. The following table can be created for such purposes that were
described above.
CREATE
TABLE [dbo].[XML_TestTbl](
[XML_ID] [bigint] IDENTITY(1,1)
NOT NULL primary
key,
[Table_NM] AS
(('XML_TestTbl')),
[XML_Data] [xml] NOT NULL,
[Schema_NM]
[varchar](20) NOT
NULL,
[InsertDate] [datetime]
NOT NULL default ( getdate() ) )
Computed field [Table_NM]
AS (('XML_TestTbl')) is created for the future generic process and needs
to be able to link extracted XML record with staging table in case that multiple
staging tables exist.
As I mentioned before, some additional standard fields can be added to the
table. For example, we can add application name or module code, or both.
The next step before we extract XML data from table XML_TestTbl
into standard table with columns, is to verify that data is matching with
registered schema in SQL Server. The simplest way to do it is to allow SQL
Server to do the job for us by inserting data into another table with a column
of XML data type of the registered schema.
declare @verify table (XML_Data XML(stat) );
Insert
into @verify(XML_Data)
Select XML_Data from
XML_TestTbl where schema_nm =
'stat';
If there are no errors, our verification for this schema is done, and it is possible to load data into regular raw data tables with regular columns by using the following statement as an example.
SELECT
STATEMENT 1
select
[XML_Data].query('//customerid').value('.','integer')
as customerid,
[XML_Data].query('//prodno').value('.','varchar(max)')
as prodno,
[XML_Data].query('//userid').value('.','integer')
as userid
from dbo.XML_TestTbl
where schema_nm =
'stat';
As you can see from the statement, each row in XML_TestTbl in
field XML_Data has to keep only one row for output to do the easiest XML to
column data conversion. To do this, XML inserted into table XML_TestTbl (XML_Data),
has to be split to keep only one to one row. For example, it can be done by the
code below. However, you can do it by many other ways
declare
@xmlvar1 xml ;
set
@xmlvar1 = '
<root>
<statrecord>
<companyid>6</companyid>
<prodno>prd06</prodno>
<userid>12345</userid>
</statrecord>
</root>
<root>
<statrecord>
<companyid>3</companyid>
<prodno>prd03</prodno>
<userid>12345</userid>
</statrecord>
</root> '
select
T1.rt.query('.')
from @xmlvar1.nodes('/*/*')
T1(rt)
As
you can see, if this is the case and multiple rows can be sent by application(s)
with one XML record, the XML structure has to have another tag <root>
between <statroot> and <statrecord> tags.
One of the issues with SELECT
STATEMENT 1 is that the string value does not have a limit.
Although in most cases we do know the size of the inserted data. Lets say,
prodno has to be varchar(6). Let us limit the size of the prodno in the
registration schema by adding schema's own data type.
drop
XML SCHEMA COLLECTION
stat
go
create
XML SCHEMA COLLECTION
stat AS
'<?xml
version="1.0" encoding="ISO-8859-1" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://ns" xmlns="http://ns">
<xs:simpleType
name="myST">
<xs:restriction base="xs:string">
<xs:maxLength value="6"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="statroot">
<xs:complexType>
<xs:sequence>
<xs:element name="statrecord" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element name="companyid" type="xs:integer"
/>
<xs:element name="prodno" type="myST" />
<xs:element name="userid" type="xs:integer" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema> '
declare
@xmlvar1 xml (stat);
set
@xmlvar1 = '
<xs:statroot xmlns:xs="http://ns">
<statrecord>
<companyid>6</companyid>
<prodno>prd12</prodno>
<userid>12345</userid>
</statrecord>
</xs:statroot> '
If we are trying to set the value for prodno =
‘prd1234' where data length is more than 6 characters we will have an error:
Msg
6926, Level 16, State 1, Line 4
XML Validation: Invalid simple type value:
'prd1234'. Location: /*:statroot[1]/*:statrecord[1]/*:prodno[1]
Now we know how to add schema custom data type from the
base one defined in SQL Server.
As you can see from the example, the first and last line
for XML requires a name space of registered XML schema. A namespace is required
for the custom data type declaration.
To minimize the amount of data passed by application, add
some security, and provide generic decoding solution, which incidentally will be
the topic of my next article; it is possible to save XML data in table XML_TestTbl
without first and last tags like this:
<statrecord>
<companyid>6</companyid>
<prodno>prd12</prodno>
<userid>12345</userid>
</statrecord>
First and last tags will be added by the generic
process to verify data and they are not required for the XML data extraction.
However, the schema must be registered with root element tag. SQL Server XML
meta data tables will be used in the verification process. Let us see how
to get a namespace for the first tag
<xs:statroot xmlns:xs="http://ns">
We need to use sys.xml_schema_collections
and sys.xml_schema_namespaces
select sn.name as
namespace_nm, sc.name
as schema_nm
from
sys.xml_schema_collections sc
inner join
sys.xml_schema_namespaces sn
on sc.xml_collection_id
= sn.xml_collection_id
where
sc.name = 'stat'
Table
sys.xml_schema_collections © 2008 Microsoft
Returns a row per XML schema collection. An XML
schema collection is a named set of XSD definitions. The XML schema collection
itself is contained in a relational schema, and it is identified by a
schema-scoped Transact-SQL name. The following tuples are unique: xml_collection_id,
namespace_id, and schema_id, name.
Column name
| Data type
| Description
|
xml_collection_id
| int
| ID of the XML schema collection. Unique within the database.
|
schema_id
| int
| ID of the relational schema that contains this XML schema collection.
|
principal_id
| int
| ID of the individual owner if different from the schema owner. By default, schema-contained objects are owned by the schema owner. However, an alternate owner may be specified by using the ALTER AUTHORIZATION statement to change ownership.
NULL = No alternate individual owner.
|
name
| sysname
| Name of the XML schema collection.
|
create_date
| datetime
| Date the XML schema collection was created.
|
modify_date
| datetime
| Date the XML schema collection was last altered.
|
Table sys.xml_schema_namespaces ©
2008 Microsoft
Returns a row per XSD-defined XML namespace. The
following tuples are unique: collection_id, namespace_id, and collection_id,
and name.
Column name
| Data type
| Description
|
xml_collection_id
| int
| ID of the XML schema collection that contains this namespace.
|
name
| nvarchar (4000)
| Name of XML namespace. Blank name indicates no target namespace.
|
xml_namespace_id
| int
| 1-based ordinal that uniquely identifies the XML namespace in the database.
|
It means that the next code can be utilized to verify XML
data with root element constructed by process.
declare
@schemanm varchar(20),
@namespacenm varchar(20),
@xmlvar1 varchar(1000);
declare
@verify table (XML_Data
XML(stat)
);
set
@schemanm = 'stat';
select
@namespacenm = sn.name
from
sys.xml_schema_collections sc
inner join
sys.xml_schema_namespaces sn
on sc.xml_collection_id
= sn.xml_collection_id
where
sc.name = @schemanm;
set
@xmlvar1 = ' <statrecord>
<companyid>6</companyid>
<prodno>prd12</prodno>
<userid>12345</userid>
</statrecord> '
Insert
into @verify(XML_Data)
select
' <xs:statroot xmlns:xs="' +
@namespacenm + '">'
+
cast(@xmlvar1
as varchar(max)
) +
'</xs:statroot>'
select xml_data from
@verify;
Conclusion
This
article describes the theory of generic process architecture that allows store
data with different input XML schemas into one staging location and then creates
a process that will output data into designated standard tables. It is important
for such data that is not required for the immediate consumption and that will
be used by applications or reports later with some form of aggregation or the
other data processed form. In our case, I used this idea to gather statistical
data. Different type of statistical "raw" data is gathered from the
application(s) into one table and extracted daily. Then, once a week staging
table clean up by the process which deleting processed records. In my next
article I will show you how to create generic code for the XML data verification
and extraction and I will explain some of the many additional options of the
process.