My previous article, A Generic Process to Convert XML Data - Part 1, described the theory of generic process architecture that allows store data with different input XML schemas into one staging location and then creates a process that will output data into designated standard tables. It is important for the data that is not required for the immediate consumption and that will be used by
applications or reports later with some form of aggregation or the other in the
processed form. In our case, I am using this idea to gather statistical data.
Different types of statistical RAW data is gathered from the application(s) into
one table and extracted daily. Then once a week the staging table is cleaned up by the
process which deletes processed records.
Let see how we can create
generic process. The point of the process is to extract data based on the
standard parameters from application e.g. from the staging table XML_TestTbl
by the Schema_NM.
CREATE
TABLE [dbo].[XML_TestTbl](
[XML_ID] [bigint] IDENTITY(1,1)
NOT NULL primary
key,
[Table_NM] AS
(('XML_TestTbl')),
[XML_Data] [xml] NOT NULL,
[Schema_NM]
[varchar](20) NOT
NULL,
[InsertDate] [datetime]
NOT NULL default ( getdate() ) )
In our example stat schema, process requires root element,
data elements, and data types to be able to create select statement from the
staging table and root element with namespace name for verification purposes.
select
[XML_Data].query('//customerid').value('.','integer')
as customerid,
[XML_Data].query('//prodno').value('.','varchar(max)')
as prodno,
[XML_Data].query('//userid').value('.','integer')
as userid
from dbo.XML_TestTbl
where schema_nm =
'stat';
To get the process required XML structure we have to select
information from the multiple system views.
select sco.name,
sn.name,
sc.name, sc1.name,
sc2.name, sc1.kind,
sc1.kind_desc, sf.kind,
sf.kind_desc, sf.is_fixed, sf.value
from sys.xml_schema_components sc
inner join sys.xml_schema_component_placements scp
on scp.xml_component_id =
sc.xml_component_id
inner join sys.xml_schema_components sc1
on scp.placed_xml_component_id = sc1.xml_component_id
inner join sys.xml_schema_components sc2
on sc1.base_xml_component_id = sc2.xml_component_id
inner join sys.xml_schema_collections sco
on sco.xml_collection_id = sc.xml_collection_id
inner join sys.xml_schema_facets sf
on sf.xml_component_id = sc1.xml_component_id
inner join sys.xml_schema_namespaces sn
on sn.xml_collection_id = sc.xml_collection_id
where sco.name
= 'stat';
sys.xml_schema_component_placements
© 2008 Microsoft
Returns a row per placement for XML schema
components.
Column name
| Data type
| Description
|
xml_component_id
| int
| ID of the XML schema component that owns this placement.
|
placement_id
| int
| ID of the placement. This is unique within the owning XML schema component.
|
placed_xml_component_id
| int
| ID of the placed XML schema component.
|
is_default_fixed
| bit
| 1 = The default value is a fixed value. This value cannot be overridden in an XML instance.
0 = The value can be overridden.(default)
|
min_occurrences
| int
| Minimum number of placed component occurs.
|
max_occurrences
| int
| Maximum number of placed component occurs.
|
default_value
| nvarchar (4000)
| Default value if one is supplied. Is NULL if a default value is not supplied.
|
sys.xml_schema_collections
© 2008 Microsoft
Returns a row per XML schema collection. An XML
schema collection is a named set of XSD definitions. The XML schema collection
itself is contained in a relational schema, and it is identified by a
schema-scoped Transact-SQL name. The following tuples are unique: xml_collection_id,
namespace_id, and schema_id, name.
Column name
| Data type
| Description
|
xml_collection_id
| int
| ID of the XML schema collection. Unique within the database.
|
schema_id
| int
| ID of the relational schema that contains this XML schema collection.
|
principal_id
| int
| ID of the individual owner if different from the schema owner. By default, schema-contained objects are owned by the schema owner. However, an alternate owner may be specified by using the ALTER AUTHORIZATION statement to change ownership.
NULL = No alternate individual owner.
|
name
| sysname
| Name of the XML schema collection.
|
create_date
| datetime
| Date the XML schema collection was created.
|
modify_date
| datetime
| Date the XML schema collection was last altered
|
sys.xml_schema_facets
© 2008 Microsoft
Returns a row per facet (restriction) of an
xml-type definition (corresponds to sys.xml_types).
Column name
| Data type
| Description
|
xml_component_id
| int
| ID of XML component (type) to which this facet belongs.
|
facet_id
| int
| ID (1-based ordinal) of facet, unique within component-id.
|
kind
| char(2)
| Kind of facet:
LG = Length
LN = Minimum Length
LX = Maximum Length
PT = Pattern (regular expression)
EU = Enumeration
IN = Minimum Inclusive value
IX = Maximum Inclusive value
EN = Minimum Exclusive value
EX = Maximum Exclusive value
DT = Total Digits
DF = Fraction Digits
WS = White Space normalization
|
kind_desc
| nvarchar (60)
| Description of kind of facet:
LENGTH
MINIMUM_LENGTH
MAXIMUM_LENGTH
PATTERN
ENUMERATION
MINIMUM_INCLUSIVE_VALUE
MAXIMUM_INCLUSIVE_VALUE
MINIMUM_EXCLUSIVE_VALUE
MAXIMUM_EXCLUSIVE_VALUE
TOTAL_DIGITS
FRACTION_DIGITS
WHITESPACE_NORMALIZATION
|
is_fixed
| bit
| 1 = Facet has a fixed, prespecified value.
0 = No fixed value. (default)
|
value
| nvarchar (4000)
| Fixed, pre-specified value of the facet.
|
We discussed sys.xml_schema_components
and sys.xml_schema_namespaces in part 1 of this article. View sys.xml_schema_components
keeping not only the references to the registered custom schema elements
but also the references to the SQL Server XML base data types and schema custom
created data types. By joining this table we can find out each of the elements
data type, data type it derived from, and the length of the derived data type.
inner join sys.xml_schema_components sc1
on scp.placed_xml_component_id = sc1.xml_component_id
In this part of the statement placed_xml_component_id
is the id of XML data type for the data element. In our case, companyid and
userid have data type 'integer' and prodno has schema defined data type data
type 'myST'.
inner
join sys.xml_schema_components sc2 on
sc1.base_xml_component_id = sc2.xml_component_id
In this part of the statement base _xml_component_id
is the id of XML data the actual element data type is derived. In our
case, companyid and userid have derived from (parent data type) 'decimal'
and prodno data type 'myST' derived from string. It will help generic
process convert data properly when extracting from XML field.
select
[XML_Data].query('//customerid').value('.','integer')
as customerid,
[XML_Data].query('//prodno').value('.','varchar(6)')
as prodno,
[XML_Data].query('//userid').value('.','integer')
as userid
from dbo.XML_TestTbl
where schema_nm =
'stat';
inner
join sys.xml_schema_facets sf on
sf.xml_component_id = sc1.xml_component_id
This relationship returns the schema custom data type
restrictions. In our case the length of myST data type in bytes.
The main portion of the process is stored in procedure spGenericXML_Step.txt
that takes XML table name, database name, and schema name to validate data and
generate select statement.
declare
@selectCMD varchar(max);
exec dbo.spGenericXML_Step
@XML_ID = 0,
@fromTBLNM = 'dbo.XML_TestTbl',
-- has to be with schema name - dbo.test
@fromDBNM =
'test',
@XMLSchemaNM = 'stat',
@selectCMD = @selectCMD OUTPUT;
@selectCMD;
Result:
select
XML_ID, Table_NM,
[XML_Data].query('//companyid').value('.','integer
') ,
[XML_Data].query('//prodno').value('.','varchar(6)
') ,
[XML_Data].query('//userid').value('.','integer
') from
asiwork.dbo.XML_TestTbl
where
Schema_NM = 'stat'
and XML_ID >
0;
This stored procedure can be used from a wrapper procedure. The wrapper can take the output statement and insert data to the specific destination table with regular columns. The table name and database from which XML data will be extracted can be defined in custom meta data based on the schema. As I mentioned before, the XML data table can hold data for the different XML schemas. The output regular data table can be defined through meta data for the process or both, source XML and destination tables can be hard coded. It is up to the person who is writing the process.
Why do we need XML_ID in the
statement and as parameter to the stored procedure?
The process of extracting XML data will be running daily (weekly, monthly). One of the
parameters that will define the processed rows will be XML_ID. If the rows are processed, the process will insert XML_ID with other data into output table. Output table structure can be like this one:
Create table XML_Output (
Recid int identity(1,1) primary key,
Companyid int,
Userid int,
Prodno varchar(6),
External_XML_ID bigint,
External_Table_NM varchar(50))
Then processed XML_IDs are known and will bypass process.
Another process that will clean up XML data table can find processed XML_IDs and
clean up only the ones that already processed. Field External_table_name gives
the information about source XML data table.
Conclusion
This article described only one of various ways XML can be
used in the database world. This architecture and process can be extended by the
use of custom meta data, adding data validation and data mining layers.
Additional XML feature can be used in XML schema. The article code was
simplified for the sake of clarity.