March 17, 2009 at 2:24 am
How to insert hierarchically formatted XML data into SQL Server database , especially when the tables in the database use IDENTITY columns for primary keys, there by making it difficult to properly store the auto-generated key values in related tables.
Fortunately, OpenXML meta-properties provide a solution for obtaining these key values and correlating them to related entities in the XML
But i have not been able to achieve the same. Can somebody help me out.
March 17, 2009 at 12:38 pm
Hi,
would you please explain a little more in detail what you're trying to achieve?
Basically, you could use the XQuery functionality of SS2K5 to get your data into the table.
If you need more information on how to do it please provide a sample of the xml structure as well as the table definition, and, if the correlation between XML and table is not obvious, please add a conversion, which element should go in what column of a table.
What have you tried so far "to achieve the same"? Would you please show/describe us what you've tried (at least by example)?
To make it easier for the folks that are trying to help you, please see the following link on how to provide sample code: http://www.sqlservercentral.com/articles/Best+Practices/61537/
Also, please note, that there is an issue with displaying XML structure the way it is originally formatted within a post. You may have to replace " " with "<" , and ">". Otherwise the XML sample will not be displayed.
March 17, 2009 at 10:23 pm
This is the sample XML
"root"
"order name="order1" description="order2"
"orderdetails item="item1" state="AL"
"order"
"root"
and there could be around 100 to 200 orders
and table structure is like this
Order (Order_id IDENTITY, Name varchar, description varchar)
Orderdetails (Orderdetails_id IDENTITY, item varchar state varchar(40), Order_id FOREINGN KEY)
I am trying to pass the xml to a stored procedure and use OPEN XML to bulk insert both order as well as order details. It works fine if the primary keys are generated at client end and it can be directly assigned to orderdetails node as well.
But in my case, i am trying to insert order first and then use the generated id from the order table and use it for inserting into orderdetails table. i dont know if there is a way to achieve this using OPENXML
March 18, 2009 at 2:20 am
Hi,
following a sample code that should do what you requested.
-- declaration of sample data
DECLARE @Orders TABLE (
Order_IDNUMERIC(18, 0) IDENTITY(1,1) NOT NULL,
NameVARCHAR(255) NOT NULL,
description VARCHAR(255) NULL
)
DECLARE @Orderdetails TABLE (
Orderdetails_ID NUMERIC(18, 0) IDENTITY(1,1) NOT NULL,
itemVARCHAR(50) NULL,
stateVARCHAR(50) NULL,
Order_IDNUMERIC(18, 0) NOT NULL
)
DECLARE @xml XML
-- Initialization of sample data
SET @xml=
'lt&;rootgt&;
lt&;order name="order1" description="order2"gt&;
lt&;orderdetails item="item1" state="AL"/gt&;
lt&;orderdetails item="item3" state="AG"/gt&;
lt&;/ordergt&;
lt&;order name="order2" description="order3"gt&;
lt&;orderdetails item="item2" state="AF"/gt&;
lt&;/ordergt&;
lt&;/rootgt&;'
-- show sample xml structure
SELECT @xml
-- actual job: step 1: insert order information into table Orders
INSERT INTO @Orders (Name, description)
SELECT CAST(T.c.query('string(./@name)') AS VARCHAR(50)) AS name,
CAST(T.c.query('string(./@description)') AS VARCHAR(50)) AS description
FROM @xml.nodes('/root/order') T(c)
-- actual job: step 2: insert order details information into table Orderdetails
INSERT INTO @Orderdetails (item, state,Order_ID)
SELECT
CAST(T.c.query('string(./@item)') AS VARCHAR(50)) AS name,
CAST(T.c.query('string(./@state)') AS VARCHAR(50)) AS description,
Orders.order_ID AS Orders
FROM @xml.nodes('/root/order/orderdetails') T(c)
INNER JOIN Orders
ON CAST(T.c.query('string(../@name)') AS VARCHAR(50)) = Orders.name
-- show results
SELECT * FROM @Orders
SELECT * FROM @Orderdetails
SELECT * FROM @Orders as Orders
INNER JOIN @Orderdetails AS Orderdetails
ON Orders.Order_ID=Orderdetails.Order_ID
March 18, 2009 at 2:59 am
Thanks a lot. The code snippet provided by you definitely helps to insert data without using OPENXML.
The approach relies on having a unique name across orders. In one of the scenarios that i am working on currently, unique value for a parent node is not guaranteed.
March 18, 2009 at 4:42 am
If you cannot guarantee a unique value for a parent node, how it is determined whether the two order nodes (with the same order name) belong to one physical order vs. being two separate orders?
Is there any other element (or combination) that actually could be used to determine uniqueness?
March 18, 2009 at 5:01 am
Actually we are using an ORM tool (nhibernate) to persist data in multiple tables (order and orderdetails). We pass the object collection to nhibernate methods to persist the data. The orders could be in the range of around 8000 or 10000 in some scenarios. The performance has taken a huge hit because of the way ORM tool persists objects.
If there are 8000 orders, tool internally would make 8000 database trips to insert orders,to retrieve the order id (identity field generated on order insert) and populates the order id back in order detail object before persisting the order detail entity.
So we are looking at moving away from ORM tool, one of the options i found was to pass the object as XML to a stored procedure and use OPENXML or XQUERY (option that you pointed out) to achieve the same.
To answer your question, at least in this particular scenario, there is no way to uniquely identity an order based on the information that we have at present.
March 18, 2009 at 5:11 am
So, what is your requirement when it comes to two or more identical order names? (two or more separate nodes)
Do these have to be inserted as two rows in the orders table or is it just one row?
Is an order strictly represented by one node, regardless of the order name?
March 18, 2009 at 5:44 am
It has to be inserted as two rows in order tables. There is no business validation that happens in the data intake operation. The business validations are done by a separate processor.
I found in one of the forums that OPENXML with tweaking of meta-properties can do the job. But i am yet to find any working sample on the same. i will spend some time in understanding more on meta-properties and check if that helps in any way.
The XQUERY approach provided by you is also pretty cool. i will def look at that for other scenarios.
March 18, 2009 at 7:55 am
What you could do is to first insert the orders into the orders table by keeping the order details attached to it (adding a new colum of XML type).
SELECT
ROW_NUMBER() OVER(ORDER BY CAST(T.c.query('string(.)') AS VARCHAR(10))) AS number,
CAST(T.c.query('string(@name)') AS VARCHAR(50)) AS name,
CAST(T.c.query('string(@description)') AS VARCHAR(50)) AS description,
CAST(T.c.query('./orderdetails') AS XML) AS orderdetails
FROM @xml.nodes('/root/order') T(c)
To extraxt the XML column into relational data together with the number column shouldn't be that hard afterwards...
(I'm personally running out of time at the moment to come up with a way to eliminate the intermediate step... maybe I've got some time later on today).
If you can work forward from the current point please let me know.
March 18, 2009 at 4:56 pm
Hi,
as promised i looked a little deeper into the subject this evening and I think I found a way to deal with it without additional (permanent) tables.
The biggest disadvantage of this solution is that you'd have to add a new column to your original Orders table.
This is required to get both, the real order_id and the ref_number (created with the ROW_NUMBER() function - see below) back into the internal reference table @tmpInsertID.
I've been struggling for a while to get around it, but with no success.
Unlike with the UPDATE statement it seems impossible to get columns included in the OUTPUT list that are not part of the INSERT statement. So I had to insert the internal ref_number into the Orders table just to catch it together with the original Order_ID.
Maybe there's a way around it but I couldn't find it...
If you or someone else who comes across this post find another solution please post it, since I'm always open to learn!
-- create a function to return a list of XML elements in table structure
CREATE FUNCTION [dbo].[fn_XMLSplit]
(
@ItemList XML
)
RETURNS @IDTable TABLE (Item VARCHAR(50),
state VARCHAR(50))
AS
BEGIN
INSERT INTO @IDTable
SELECT
CAST(T.c.query('string(@item)') AS VARCHAR(50)) AS name,
CAST(T.c.query('string(@state)') AS VARCHAR(50)) AS description
FROM @ItemList.nodes('/orderdetails') T(c)
RETURN
END
-- end of function declaration
go
--- declaration of sample data
-- final table Orders
DECLARE @Orders TABLE (
Order_ID NUMERIC(18, 0) IDENTITY(1,1) NOT NULL,
Name VARCHAR(255) NOT NULL,
description VARCHAR(255) NULL,
ref_numberINT null
)
-- final table Orderdetails
DECLARE @Orderdetails TABLE (
Orderdetails_ID NUMERIC(18, 0) IDENTITY(1,1) NOT NULL,
item VARCHAR(50) NULL,
state VARCHAR(50) NULL,
Order_ID NUMERIC(18, 0) NOT NULL
)
-- internal table to catch the ID's from the Orders table during insert
DECLARE @tmpInsertID TABLE (
Order_ID INT ,
ref_numberINT)
-- internal denormalized table to keep connection between ref_number of an order to the details associated with it
DECLARE @tmpDenorm TABLE (
ref_number INT,
name VARCHAR(50),
description VARCHAR(255),
Item VARCHAR(50),
state VARCHAR(50))
DECLARE @xml XML
--- Initialize sample data
SET @xml=
'lt&;rootgt&;
lt&;order name="order1" description="order2"gt&;
lt&;orderdetails item="item1" state="AL"/gt&;
lt&;orderdetails item="item3" state="AG"/gt&;
lt&;/ordergt&;
lt&;order name="order2" description="order3"gt&;
lt&;orderdetails item="item2" state="AF"/gt&;
lt&;/ordergt&;
lt&;order name="order1" description="orderN"gt&;
lt&;orderdetails item="item0" state="AX"/gt&;
lt&;orderdetails item="item9" state="AZ"/gt&;
lt&;/ordergt&;
lt&;/rootgt&;'
--- show sample xml structure
SELECT @xml
--
/*
The function [fn_XMLSplit] returns item and state for each order detail.
The reason not to use pure XQuery functions for this approach is caused by the ROW_NUMBER() function that is used to assign a unique number to each of the order nodes.
Any attempt to use the PARTITION OVER statement in a standard XQuery failed since there is no unique element to partition over.
Note: the offset of +10 is used to show the difference between the "internal" order number and the order number returned from the insert statement
*/
;WITH data(number,name, description,orderdetails) AS
(
SELECT
(ROW_NUMBER() OVER(ORDER BY CAST(T.c.query('string(.)') AS VARCHAR(10))) +10) AS number,
CAST(T.c.query('string(@name)') AS VARCHAR(50)) AS name,
CAST(T.c.query('string(@description)') AS VARCHAR(50)) AS description,
CAST(T.c.query('./orderdetails') AS XML) AS orderdetails
FROM @xml.nodes('/root/order') T(c))
INSERT INTO @tmpDenorm (ref_number,name, description, Item, state)
SELECT number,name, description, Item, state
FROM data
CROSS APPLY
dbo.fn_XMLSplit(data.orderdetails)
/*
Insert data in Orders table and catch the returned order_id in the internal table @tmpInsertID
*/
INSERT INTO @Orders (name,description,ref_number)
OUTPUT inserted.order_id, inserted.ref_number
INTO @tmpInsertID
SELECT name, description, ref_number
FROM @tmpDenorm AS tmpD
GROUP BY ref_number,name,description
/*
Finally, insert the order details in the related Orderdetails table using the Order_ID
as returned into the internal reference table @tmpInsertID with the statement above.
*/
INSERT INTO @Orderdetails (item, state, order_id)
SELECT tmpD.Item, tmpD.state, tmpI.Order_ID
FROM @tmpInsertID AS tmpI
INNER JOIN @tmpDenorm tmpD ON tmpI.ref_number = tmpD.ref_number
SELECT 'values of internal tables'
SELECT * FROM @tmpInsertID
SELECT * FROM @tmpDenorm
SELECT ''
SELECT 'values of final tables'
SELECT * FROM @Orders
SELECT * FROM @Orderdetails
March 19, 2009 at 1:14 am
Yet another thought:
You could also use the max(Order_ID) instead of my +10 offset to generate new primary keys with ROW_NUMBER().
Therewith you could avoid the addtl. column. But instead of adding the new column you'd need to drop the identity constraint of the order_id column (but keep it a primary key).
If you do so, you need to make sure you have a proper error handling just in case you end up with a duplicate key because another process inserted lines in the order table at the same time.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply