OPENXML meta-properties to store hierarchial data

  • How to insert hierarchically formatted XML data into SQL Server database , especially when the tables in the database use IDENTITY columns for primary keys, there by making it difficult to properly store the auto-generated key values in related tables.

    Fortunately, OpenXML meta-properties provide a solution for obtaining these key values and correlating them to related entities in the XML

    But i have not been able to achieve the same. Can somebody help me out.

  • Hi,

    would you please explain a little more in detail what you're trying to achieve?

    Basically, you could use the XQuery functionality of SS2K5 to get your data into the table.

    If you need more information on how to do it please provide a sample of the xml structure as well as the table definition, and, if the correlation between XML and table is not obvious, please add a conversion, which element should go in what column of a table.

    What have you tried so far "to achieve the same"? Would you please show/describe us what you've tried (at least by example)?

    To make it easier for the folks that are trying to help you, please see the following link on how to provide sample code: http://www.sqlservercentral.com/articles/Best+Practices/61537/

    Also, please note, that there is an issue with displaying XML structure the way it is originally formatted within a post. You may have to replace " " with "<" , and ">". Otherwise the XML sample will not be displayed.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • This is the sample XML

    "root"

    "order name="order1" description="order2"

    "orderdetails item="item1" state="AL"

    "order"

    "root"

    and there could be around 100 to 200 orders

    and table structure is like this

    Order (Order_id IDENTITY, Name varchar, description varchar)

    Orderdetails (Orderdetails_id IDENTITY, item varchar state varchar(40), Order_id FOREINGN KEY)

    I am trying to pass the xml to a stored procedure and use OPEN XML to bulk insert both order as well as order details. It works fine if the primary keys are generated at client end and it can be directly assigned to orderdetails node as well.

    But in my case, i am trying to insert order first and then use the generated id from the order table and use it for inserting into orderdetails table. i dont know if there is a way to achieve this using OPENXML

  • Hi,

    following a sample code that should do what you requested.

    -- declaration of sample data

    DECLARE @Orders TABLE (

    Order_IDNUMERIC(18, 0) IDENTITY(1,1) NOT NULL,

    NameVARCHAR(255) NOT NULL,

    description VARCHAR(255) NULL

    )

    DECLARE @Orderdetails TABLE (

    Orderdetails_ID NUMERIC(18, 0) IDENTITY(1,1) NOT NULL,

    itemVARCHAR(50) NULL,

    stateVARCHAR(50) NULL,

    Order_IDNUMERIC(18, 0) NOT NULL

    )

    DECLARE @xml XML

    -- Initialization of sample data

    SET @xml=

    'lt&;rootgt&;

    lt&;order name="order1" description="order2"gt&;

    lt&;orderdetails item="item1" state="AL"/gt&;

    lt&;orderdetails item="item3" state="AG"/gt&;

    lt&;/ordergt&;

    lt&;order name="order2" description="order3"gt&;

    lt&;orderdetails item="item2" state="AF"/gt&;

    lt&;/ordergt&;

    lt&;/rootgt&;'

    -- show sample xml structure

    SELECT @xml

    -- actual job: step 1: insert order information into table Orders

    INSERT INTO @Orders (Name, description)

    SELECT CAST(T.c.query('string(./@name)') AS VARCHAR(50)) AS name,

    CAST(T.c.query('string(./@description)') AS VARCHAR(50)) AS description

    FROM @xml.nodes('/root/order') T(c)

    -- actual job: step 2: insert order details information into table Orderdetails

    INSERT INTO @Orderdetails (item, state,Order_ID)

    SELECT

    CAST(T.c.query('string(./@item)') AS VARCHAR(50)) AS name,

    CAST(T.c.query('string(./@state)') AS VARCHAR(50)) AS description,

    Orders.order_ID AS Orders

    FROM @xml.nodes('/root/order/orderdetails') T(c)

    INNER JOIN Orders

    ON CAST(T.c.query('string(../@name)') AS VARCHAR(50)) = Orders.name

    -- show results

    SELECT * FROM @Orders

    SELECT * FROM @Orderdetails

    SELECT * FROM @Orders as Orders

    INNER JOIN @Orderdetails AS Orderdetails

    ON Orders.Order_ID=Orderdetails.Order_ID



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks a lot. The code snippet provided by you definitely helps to insert data without using OPENXML.

    The approach relies on having a unique name across orders. In one of the scenarios that i am working on currently, unique value for a parent node is not guaranteed.

  • If you cannot guarantee a unique value for a parent node, how it is determined whether the two order nodes (with the same order name) belong to one physical order vs. being two separate orders?

    Is there any other element (or combination) that actually could be used to determine uniqueness?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Actually we are using an ORM tool (nhibernate) to persist data in multiple tables (order and orderdetails). We pass the object collection to nhibernate methods to persist the data. The orders could be in the range of around 8000 or 10000 in some scenarios. The performance has taken a huge hit because of the way ORM tool persists objects.

    If there are 8000 orders, tool internally would make 8000 database trips to insert orders,to retrieve the order id (identity field generated on order insert) and populates the order id back in order detail object before persisting the order detail entity.

    So we are looking at moving away from ORM tool, one of the options i found was to pass the object as XML to a stored procedure and use OPENXML or XQUERY (option that you pointed out) to achieve the same.

    To answer your question, at least in this particular scenario, there is no way to uniquely identity an order based on the information that we have at present.

  • So, what is your requirement when it comes to two or more identical order names? (two or more separate nodes)

    Do these have to be inserted as two rows in the orders table or is it just one row?

    Is an order strictly represented by one node, regardless of the order name?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • It has to be inserted as two rows in order tables. There is no business validation that happens in the data intake operation. The business validations are done by a separate processor.

    I found in one of the forums that OPENXML with tweaking of meta-properties can do the job. But i am yet to find any working sample on the same. i will spend some time in understanding more on meta-properties and check if that helps in any way.

    The XQUERY approach provided by you is also pretty cool. i will def look at that for other scenarios.

  • What you could do is to first insert the orders into the orders table by keeping the order details attached to it (adding a new colum of XML type).

    SELECT

    ROW_NUMBER() OVER(ORDER BY CAST(T.c.query('string(.)') AS VARCHAR(10))) AS number,

    CAST(T.c.query('string(@name)') AS VARCHAR(50)) AS name,

    CAST(T.c.query('string(@description)') AS VARCHAR(50)) AS description,

    CAST(T.c.query('./orderdetails') AS XML) AS orderdetails

    FROM @xml.nodes('/root/order') T(c)

    To extraxt the XML column into relational data together with the number column shouldn't be that hard afterwards...

    (I'm personally running out of time at the moment to come up with a way to eliminate the intermediate step... maybe I've got some time later on today).

    If you can work forward from the current point please let me know.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Hi,

    as promised i looked a little deeper into the subject this evening and I think I found a way to deal with it without additional (permanent) tables.

    The biggest disadvantage of this solution is that you'd have to add a new column to your original Orders table.

    This is required to get both, the real order_id and the ref_number (created with the ROW_NUMBER() function - see below) back into the internal reference table @tmpInsertID.

    I've been struggling for a while to get around it, but with no success.

    Unlike with the UPDATE statement it seems impossible to get columns included in the OUTPUT list that are not part of the INSERT statement. So I had to insert the internal ref_number into the Orders table just to catch it together with the original Order_ID.

    Maybe there's a way around it but I couldn't find it...

    If you or someone else who comes across this post find another solution please post it, since I'm always open to learn!

    -- create a function to return a list of XML elements in table structure

    CREATE FUNCTION [dbo].[fn_XMLSplit]

    (

    @ItemList XML

    )

    RETURNS @IDTable TABLE (Item VARCHAR(50),

    state VARCHAR(50))

    AS

    BEGIN

    INSERT INTO @IDTable

    SELECT

    CAST(T.c.query('string(@item)') AS VARCHAR(50)) AS name,

    CAST(T.c.query('string(@state)') AS VARCHAR(50)) AS description

    FROM @ItemList.nodes('/orderdetails') T(c)

    RETURN

    END

    -- end of function declaration

    go

    --- declaration of sample data

    -- final table Orders

    DECLARE @Orders TABLE (

    Order_ID NUMERIC(18, 0) IDENTITY(1,1) NOT NULL,

    Name VARCHAR(255) NOT NULL,

    description VARCHAR(255) NULL,

    ref_numberINT null

    )

    -- final table Orderdetails

    DECLARE @Orderdetails TABLE (

    Orderdetails_ID NUMERIC(18, 0) IDENTITY(1,1) NOT NULL,

    item VARCHAR(50) NULL,

    state VARCHAR(50) NULL,

    Order_ID NUMERIC(18, 0) NOT NULL

    )

    -- internal table to catch the ID's from the Orders table during insert

    DECLARE @tmpInsertID TABLE (

    Order_ID INT ,

    ref_numberINT)

    -- internal denormalized table to keep connection between ref_number of an order to the details associated with it

    DECLARE @tmpDenorm TABLE (

    ref_number INT,

    name VARCHAR(50),

    description VARCHAR(255),

    Item VARCHAR(50),

    state VARCHAR(50))

    DECLARE @xml XML

    --- Initialize sample data

    SET @xml=

    'lt&;rootgt&;

    lt&;order name="order1" description="order2"gt&;

    lt&;orderdetails item="item1" state="AL"/gt&;

    lt&;orderdetails item="item3" state="AG"/gt&;

    lt&;/ordergt&;

    lt&;order name="order2" description="order3"gt&;

    lt&;orderdetails item="item2" state="AF"/gt&;

    lt&;/ordergt&;

    lt&;order name="order1" description="orderN"gt&;

    lt&;orderdetails item="item0" state="AX"/gt&;

    lt&;orderdetails item="item9" state="AZ"/gt&;

    lt&;/ordergt&;

    lt&;/rootgt&;'

    --- show sample xml structure

    SELECT @xml

    --

    /*

    The function [fn_XMLSplit] returns item and state for each order detail.

    The reason not to use pure XQuery functions for this approach is caused by the ROW_NUMBER() function that is used to assign a unique number to each of the order nodes.

    Any attempt to use the PARTITION OVER statement in a standard XQuery failed since there is no unique element to partition over.

    Note: the offset of +10 is used to show the difference between the "internal" order number and the order number returned from the insert statement

    */

    ;WITH data(number,name, description,orderdetails) AS

    (

    SELECT

    (ROW_NUMBER() OVER(ORDER BY CAST(T.c.query('string(.)') AS VARCHAR(10))) +10) AS number,

    CAST(T.c.query('string(@name)') AS VARCHAR(50)) AS name,

    CAST(T.c.query('string(@description)') AS VARCHAR(50)) AS description,

    CAST(T.c.query('./orderdetails') AS XML) AS orderdetails

    FROM @xml.nodes('/root/order') T(c))

    INSERT INTO @tmpDenorm (ref_number,name, description, Item, state)

    SELECT number,name, description, Item, state

    FROM data

    CROSS APPLY

    dbo.fn_XMLSplit(data.orderdetails)

    /*

    Insert data in Orders table and catch the returned order_id in the internal table @tmpInsertID

    */

    INSERT INTO @Orders (name,description,ref_number)

    OUTPUT inserted.order_id, inserted.ref_number

    INTO @tmpInsertID

    SELECT name, description, ref_number

    FROM @tmpDenorm AS tmpD

    GROUP BY ref_number,name,description

    /*

    Finally, insert the order details in the related Orderdetails table using the Order_ID

    as returned into the internal reference table @tmpInsertID with the statement above.

    */

    INSERT INTO @Orderdetails (item, state, order_id)

    SELECT tmpD.Item, tmpD.state, tmpI.Order_ID

    FROM @tmpInsertID AS tmpI

    INNER JOIN @tmpDenorm tmpD ON tmpI.ref_number = tmpD.ref_number

    SELECT 'values of internal tables'

    SELECT * FROM @tmpInsertID

    SELECT * FROM @tmpDenorm

    SELECT ''

    SELECT 'values of final tables'

    SELECT * FROM @Orders

    SELECT * FROM @Orderdetails



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Yet another thought:

    You could also use the max(Order_ID) instead of my +10 offset to generate new primary keys with ROW_NUMBER().

    Therewith you could avoid the addtl. column. But instead of adding the new column you'd need to drop the identity constraint of the order_id column (but keep it a primary key).

    If you do so, you need to make sure you have a proper error handling just in case you end up with a duplicate key because another process inserted lines in the order table at the same time.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply