A Generic Process to Convert XML Data - Part 1

There are many articles on XML and how to work with XML structures. SQLServerCentral.com is one of many great sources with an abundance of articles. In this article I would like to show how XML can be used in real life scenarios and how XML can be validated by the process against XML schema registered in SQL Server.

My company collects a lot of statistical data from various

applications. The data elements are often either changed or added, and sometimes

both. When an adjustment is required, it's required to be modified in many

stored procedures because usually the number of parameters is changed.

Recently we decided to create a generic solution that will require applications to pass only few standard parameters: stats data in XML, XML schema name, and application code. The third parameter is irrelevant for this article and will be omitted in further discussion.

Below I have drawn out a general architectural idea. As you can see, the application(s) output data into a general staging XML table. It can be one table per application or one table per server, or table(s) for any other grouping. The decision should be made based on the number of records, number of applications, and practicality. The number of records should be the last concern because application(s) will only insert statistical data and never consume it.

Let start with a simple example of an XML schema and pretend that the statistical

data consists of three fields: companyid integer, prodno varchar, and userid

integer. To be able to validate the XML value against the registered XML schema, the custom schema has to be created and registered in SQL Server. Remember that the schema registration is per database.

To avoid the schema registration in every database, I created a special staging database per server to hold table(s) with staging statistical data. The same database can be used to hold tables with extracted "raw" statistical data. But I would recommend create another independent database. I will not explain in detail the XML schema because it is not a purpose of the article. Instead I will touch on the necessary parts of the XML syntax's.

drop

XML SCHEMA COLLECTION

stat

create

XML SCHEMA COLLECTION

stat AS

'<?xml

version="1.0" encoding="ISO-8859-1" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://ns" xmlns="http://ns">

<xs:element name="statroot">

<xs:complexType>

<xs:sequence>

<xs:element name="statrecord" minOccurs="0">

<xs:complexType>

<xs:sequence>

<xs:element

name="companyid" type="xs:integer" />

<xs:element name="prodno" type="xs:string" />

<xs:element name="userid" type="xs:integer" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema> '

To verify the proper schema registration we can declare the XML variable

type of schema "stat" and assign data.

declare

@xmlvar1 xml (stat);

set

@xmlvar1 = '

<xs:statroot xmlns:xs="http://ns">

</statrecord>

</xs:statroot>

As I said before, different applications and different parts of the same

application usually output different statistical data. This means that we will

have data packed into multiple XML schemas that have to be inserted into one

staging table. The following table can be created for such purposes that were

described above.

CREATE

TABLE [dbo].[XML_TestTbl](

[XML_ID] [bigint] IDENTITY(1,1)

NOT NULL primary

key,

[Table_NM] AS

(('XML_TestTbl')),

[XML_Data] [xml] NOT NULL,

[Schema_NM]

[varchar](20) NOT

NULL,

[InsertDate] [datetime]

NOT NULL default ( getdate() ) )

Computed field [Table_NM]

AS (('XML_TestTbl')) is created for the future generic process and needs

to be able to link extracted XML record with staging table in case that multiple

staging tables exist.

As I mentioned before, some additional standard fields can be added to the

table. For example, we can add application name or module code, or both.

The next step before we extract XML data from table XML_TestTbl

into standard table with columns, is to verify that data is matching with

registered schema in SQL Server. The simplest way to do it is to allow SQL

Server to do the job for us by inserting data into another table with a column

of XML data type of the registered schema.

declare @verify table (XML_Data XML(stat) );

Insert

into @verify(XML_Data)

Select XML_Data from

XML_TestTbl where schema_nm =

'stat';

If there are no errors, our verification for this schema is done, and it is possible to load data into regular raw data tables with regular columns by using the following statement as an example.

SELECT

STATEMENT 1

select

[XML_Data].query('//customerid').value('.','integer')

as customerid,

[XML_Data].query('//prodno').value('.','varchar(max)')

as prodno,

[XML_Data].query('//userid').value('.','integer')

as userid

from dbo.XML_TestTbl

where schema_nm =

'stat';

As you can see from the statement, each row in XML_TestTbl in

field XML_Data has to keep only one row for output to do the easiest XML to

column data conversion. To do this, XML inserted into table XML_TestTbl (XML_Data),

has to be split to keep only one to one row. For example, it can be done by the

code below. However, you can do it by many other ways

declare

@xmlvar1 xml ;

set

@xmlvar1 = '

<root>

</statrecord>

</root>

<root>

</statrecord>

</root> '

select

T1.rt.query('.')

from @xmlvar1.nodes('/*/*')

T1(rt)

you can see, if this is the case and multiple rows can be sent by application(s)

with one XML record, the XML structure has to have another tag <root>

between <statroot> and <statrecord> tags.

One of the issues with SELECT

STATEMENT 1 is that the string value does not have a limit.

Although in most cases we do know the size of the inserted data. Lets say,

prodno has to be varchar(6). Let us limit the size of the prodno in the

registration schema by adding schema's own data type.

drop

XML SCHEMA COLLECTION

stat

create

XML SCHEMA COLLECTION

stat AS

'<?xml

version="1.0" encoding="ISO-8859-1" ?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"

targetNamespace="http://ns" xmlns="http://ns">

<xs:simpleType

name="myST">

<xs:restriction base="xs:string">

<xs:maxLength value="6"/>

</xs:restriction>

</xs:simpleType>

<xs:element name="statroot">

<xs:complexType>

<xs:sequence>

<xs:element name="statrecord" minOccurs="0">

<xs:complexType>

<xs:sequence>

<xs:element name="companyid" type="xs:integer"

<xs:element name="prodno" type="myST" />

<xs:element name="userid" type="xs:integer" />

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema> '

declare

@xmlvar1 xml (stat);

set

@xmlvar1 = '

<xs:statroot xmlns:xs="http://ns">

</statrecord>

</xs:statroot> '

If we are trying to set the value for prodno =

‘prd1234' where data length is more than 6 characters we will have an error:

Msg

6926, Level 16, State 1, Line 4

XML Validation: Invalid simple type value:

'prd1234'. Location: /*:statroot[1]/*:statrecord[1]/*:prodno[1]

Now we know how to add schema custom data type from the

base one defined in SQL Server.

As you can see from the example, the first and last line

for XML requires a name space of registered XML schema. A namespace is required

for the custom data type declaration.

To minimize the amount of data passed by application, add

some security, and provide generic decoding solution, which incidentally will be

the topic of my next article; it is possible to save XML data in table XML_TestTbl

without first and last tags like this:

</statrecord>

First and last tags will be added by the generic

process to verify data and they are not required for the XML data extraction.

However, the schema must be registered with root element tag. SQL Server XML

meta data tables will be used in the verification process. Let us see how

to get a namespace for the first tag

<xs:statroot xmlns:xs="http://ns">

We need to use sys.xml_schema_collections

and sys.xml_schema_namespaces

select sn.name as

namespace_nm, sc.name

as schema_nm

from

sys.xml_schema_collections sc

inner join

sys.xml_schema_namespaces sn

on sc.xml_collection_id

= sn.xml_collection_id

where

sc.name = 'stat'

Table

Returns a row per XML schema collection. An XML

schema collection is a named set of XSD definitions. The XML schema collection

itself is contained in a relational schema, and it is identified by a

schema-scoped Transact-SQL name. The following tuples are unique: xml_collection_id,

namespace_id, and schema_id, name.

Column

name

Data

type

Description

xml_collection_id

int

ID of the XML schema collection. Unique

within the database.

schema_id

int

ID of the relational schema that contains

this XML schema collection.

principal_id

int

ID of the individual owner if different from

the schema owner. By default, schema-contained objects are owned by the

schema owner. However, an alternate owner may be specified by using the

ALTER AUTHORIZATION statement to change ownership.

NULL = No alternate individual owner.

name

sysname

Name of the XML schema collection.

create_date

datetime

Date the XML schema collection was created.

modify_date

datetime

Date the XML schema collection was last

altered.

2008 Microsoft

Returns a row per XSD-defined XML namespace. The

following tuples are unique: collection_id, namespace_id, and collection_id,

and name.

Column

name

Data

type

Description

xml_collection_id

int

ID of the XML schema collection that contains

this namespace.

name

nvarchar

(4000)

Name of XML namespace. Blank name

indicates no target namespace.

xml_namespace_id

int

1-based ordinal that uniquely identifies the

XML namespace in the database.

It means that the next code can be utilized to verify XML

data with root element constructed by process.

declare

@schemanm varchar(20),

@namespacenm varchar(20),

@xmlvar1 varchar(1000);

declare

@verify table (XML_Data

XML(stat)

);

set

@schemanm = 'stat';

select

@namespacenm = sn.name

from

sys.xml_schema_collections sc

inner join

sys.xml_schema_namespaces sn

on sc.xml_collection_id

= sn.xml_collection_id

where

sc.name = @schemanm;

set

@xmlvar1 = ' <statrecord>

</statrecord> '

Insert

into @verify(XML_Data)

select

' <xs:statroot xmlns:xs="' +

@namespacenm + '">'

cast(@xmlvar1

as varchar(max)

) +

'</xs:statroot>'

select xml_data from

@verify;

Conclusion

This

article describes the theory of generic process architecture that allows store

data with different input XML schemas into one staging location and then creates

a process that will output data into designated standard tables. It is important

for such data that is not required for the immediate consumption and that will

be used by applications or reports later with some form of aggregation or the

other data processed form. In our case, I used this idea to gather statistical

data. Different type of statistical "raw" data is gathered from the

application(s) into one table and extracted daily. Then, once a week staging

table clean up by the process which deleting processed records. In my next

article I will show you how to create generic code for the XML data verification

and extraction and I will explain some of the many additional options of the

process.

A Generic Process to Convert XML Data - Part 1

Conclusion

Resources

Rate

Share

Categories

Share

Rate

A Generic Process to Convert XML Data - Part 1

Conclusion

Resources

Rate

Share

Categories

Share

Rate

Related content

XML Schema validation in SQL Server

Script SQL Table Schema as XML-XSD

A Generic Process to Convert XML Data - Part 2

XML Workshop XIII - XSD And Variable Content Containers

XML Workshop XI - Default Namespaces