XML multiple nodes not working

  • I have some xml that I need to pull data from. Below is a sample of the xml and what I'm trying to do. It's currently returning four records but I need it to return six records. The two missing records are for MerchantReferenceNumber ECM0042359294 LineItem = 1.

    Any help would be greatly appreciated.

    declare @xml_data xml = '

    <Report MerchantID="123456">

    <Requests>

    <Request RequestID="4791333844346002104017" MerchantReferenceNumber="RA00042357770">

    <LineItems>

    <LineItem Number="0">

    <Quantity>1</Quantity>

    <UnitPrice>59.99</UnitPrice>

    </LineItem>

    </LineItems>

    <ApplicationReplies>

    <ApplicationReply Name="ics_auth">

    <RCode>1</RCode>

    </ApplicationReply>

    <ApplicationReply Name="ics_bill">

    <RCode>1</RCode>

    </ApplicationReply>

    </ApplicationReplies>

    </Request>

    <Request RequestID="4791588597036877403069" MerchantReferenceNumber="ECM0042359294">

    <LineItems>

    <LineItem Number="0">

    <Quantity>1</Quantity>

    <UnitPrice>44.99</UnitPrice>

    </LineItem>

    <LineItem Number="1">

    <Quantity>1</Quantity>

    <UnitPrice>9.99</UnitPrice>

    </LineItem>

    </LineItems>

    <ApplicationReplies>

    <ApplicationReply Name="ics_auth">

    <RCode>1</RCode>

    </ApplicationReply>

    <ApplicationReply Name="ics_pay_subscription_create">

    <RCode>1</RCode>

    </ApplicationReply>

    </ApplicationReplies>

    </Request>

    </Requests>

    </Report>'

    select t.c.value('../../../../@MerchantID', 'int') as MerchantID,

    t.c.value('../../@MerchantReferenceNumber', 'varchar(100)') as MerchantReferenceNumber,

    t.c.value('(../../LineItems/LineItem/@Number)[1]', 'int') as LineNumber,

    t.c.value('(../../LineItems/LineItem/UnitPrice)[1]', 'decimal(18,2)') as UnitPrice,

    t.c.value('@Name', 'varchar(100)') as ApplicationReplyName

    from @xml_data.nodes('/Report/Requests/Request/ApplicationReplies/ApplicationReply') t(c)

    order by MerchantReferenceNumber, LineNumber

  • Does this get you any closer?

    WITH ApplicationReplyNames AS

    (

    SELECT u.c.value('@Name', 'varchar(100)') as ApplicationReplyName,u.c.value('../../@MerchantReferenceNumber', 'varchar(100)') as MerchantReferenceNumber

    from @xml_data.nodes('/Report/Requests/Request/ApplicationReplies/ApplicationReply') u(c)

    )

    select t.c.value('../../../../@MerchantID', 'int') as MerchantID,

    t.c.value('../../@MerchantReferenceNumber', 'varchar(100)') as MerchantReferenceNumber,

    t.c.value('(@Number)[1]', 'int') as LineNumber,

    t.c.value('(UnitPrice)[1]', 'decimal(18,2)') as UnitPrice,

    arn.ApplicationReplyName

    from @xml_data.nodes('/Report/Requests/Request/LineItems/LineItem') t(c)

    JOIN ApplicationReplyNames arn ON t.c.value('../../@MerchantReferenceNumber', 'varchar(100)') = arn.MerchantReferenceNumber

    order by MerchantReferenceNumber, LineNumber

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Thanks Phil.

    Was hoping to avoid using CTE as the actual xml contains ~100k requests with a couple more nodes that would require same CTE solution to pull data. Are there any other solutions? Would OPENXML be better?

  • Here is an alternative solution

    😎

    BTW, OPENXML is not better, stick to the XQUERY

    USE TEEST;

    GO

    SET NOCOUNT ON;

    DECLARE @XML_DATA XML = '

    <Report MerchantID="123456">

    <Requests>

    <Request RequestID="4791333844346002104017" MerchantReferenceNumber="RA00042357770">

    <LineItems>

    <LineItem Number="0">

    <Quantity>1</Quantity>

    <UnitPrice>59.99</UnitPrice>

    </LineItem>

    </LineItems>

    <ApplicationReplies>

    <ApplicationReply Name="ics_auth">

    <RCode>1</RCode>

    </ApplicationReply>

    <ApplicationReply Name="ics_bill">

    <RCode>1</RCode>

    </ApplicationReply>

    </ApplicationReplies>

    </Request>

    <Request RequestID="4791588597036877403069" MerchantReferenceNumber="ECM0042359294">

    <LineItems>

    <LineItem Number="0">

    <Quantity>1</Quantity>

    <UnitPrice>44.99</UnitPrice>

    </LineItem>

    <LineItem Number="1">

    <Quantity>1</Quantity>

    <UnitPrice>9.99</UnitPrice>

    </LineItem>

    </LineItems>

    <ApplicationReplies>

    <ApplicationReply Name="ics_auth">

    <RCode>1</RCode>

    </ApplicationReply>

    <ApplicationReply Name="ics_pay_subscription_create">

    <RCode>1</RCode>

    </ApplicationReply>

    </ApplicationReplies>

    </Request>

    </Requests>

    </Report>'

    SELECT

    REQUEST.DATA.value('(//@MerchantID)[1]' ,'INT' ) AS MerchantID

    ,REQUEST.DATA.value('@MerchantReferenceNumber' ,'VARCHAR(100)' ) AS MerchantReferenceNumber

    ,LINEITEM.DATA.value('@Number' ,'INT' ) AS Number

    ,LINEITEM.DATA.value('(Quantity/text())[1]' ,'INT' ) AS Quantity

    ,LINEITEM.DATA.value('(UnitPrice/text())[1]' ,'DECIMAL(18,2)') AS UnitPrice

    ,APPLICATIONREPLY.DATA.value('@Name' ,'VARCHAR(100)' ) AS ApplicationReply_Name

    FROM @XML_DATA.nodes('Report/Requests/Request') REQUEST(DATA)

    CROSS APPLY REQUEST.DATA.nodes('LineItems/LineItem') LINEITEM(DATA)

    CROSS APPLY REQUEST.DATA.nodes('ApplicationReplies/ApplicationReply') APPLICATIONREPLY(DATA);

    Output

    MerchantID MerchantReferenceNumber Number Quantity UnitPrice ApplicationReply_Name

    ----------- ------------------------- ----------- ----------- ---------- ----------------------------

    123456 RA00042357770 0 1 59.99 ics_auth

    123456 RA00042357770 0 1 59.99 ics_bill

    123456 ECM0042359294 0 1 44.99 ics_auth

    123456 ECM0042359294 1 1 9.99 ics_auth

    123456 ECM0042359294 0 1 44.99 ics_pay_subscription_create

    123456 ECM0042359294 1 1 9.99 ics_pay_subscription_create

  • I compared execution plans ... my version seems significantly less expensive, at least on this tiny amount of data.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Thank you Eirikur. That was exactly what I looking for.

    In terms of performance, Eirikur's solution performed better. It had less reads, most likely because it did not use a CTE. After applying the solution to the actual xml, the performance gains were even more significant (~10 sec vs ~2 min)

    Thanks for all the help!

  • Phil Parkin (11/17/2016)


    I compared execution plans ... my version seems significantly less expensive, at least on this tiny amount of data.

    The execution plan for my version is significantly simpler (41 vs 96 operators) although the cost is set much higher within the plan, main reason being that it does a full optimization on my query but aborts on a timeout on the others, hence the cost figures for the those are fully fledged guesstimates.

    😎

    Two points abut the queries:

    1) Backward traversing up the structure is expensive, should be avoided if possible.

    2) Retrieving node values requires an internal reconstruction of the XML unless either the data() or the text() functions are used.

  • bennylu (11/17/2016)


    Thank you Eirikur. That was exactly what I looking for.

    In terms of performance, Eirikur's solution performed better. It had less reads, most likely because it did not use a CTE. After applying the solution to the actual xml, the performance gains were even more significant (~10 sec vs ~2 min)

    Thanks for all the help!

    You are very welcome.

    😎

  • Eirikur Eiriksson (11/17/2016)


    Phil Parkin (11/17/2016)


    I compared execution plans ... my version seems significantly less expensive, at least on this tiny amount of data.

    The execution plan for my version is significantly simpler (41 vs 96 operators) although the cost is set much higher within the plan, main reason being that it does a full optimization on my query but aborts on a timeout on the others, hence the cost figures for the those are fully fledged guesstimates.

    😎

    Two points abut the queries:

    1) Backward traversing up the structure is expensive, should be avoided if possible.

    2) Retrieving node values requires an internal reconstruction of the XML unless either the data() or the text() functions are used.

    Thanks Eirikur.

    Next time someone poses an XQuery question, I think I might just keep quiet 🙂

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Phil Parkin (11/17/2016)


    Eirikur Eiriksson (11/17/2016)


    Phil Parkin (11/17/2016)


    I compared execution plans ... my version seems significantly less expensive, at least on this tiny amount of data.

    The execution plan for my version is significantly simpler (41 vs 96 operators) although the cost is set much higher within the plan, main reason being that it does a full optimization on my query but aborts on a timeout on the others, hence the cost figures for the those are fully fledged guesstimates.

    😎

    Two points abut the queries:

    1) Backward traversing up the structure is expensive, should be avoided if possible.

    2) Retrieving node values requires an internal reconstruction of the XML unless either the data() or the text() functions are used.

    Thanks Eirikur.

    You are very welcome.

    😎

    Next time someone poses an XQuery question, I think I might just keep quiet 🙂

    Please don't, there is always something to be learned;-)

  • Eirikur's solution can be made even faster if you specify in your xquery that you will only ever accept one root report-node per xml document.

    DECLARE @XML_DATA XML = '

    <Report MerchantID="123456">

    <Requests>

    <Request RequestID="4791333844346002104017" MerchantReferenceNumber="RA00042357770">

    <LineItems>

    <LineItem Number="0">

    <Quantity>1</Quantity>

    <UnitPrice>59.99</UnitPrice>

    </LineItem>

    </LineItems>

    <ApplicationReplies>

    <ApplicationReply Name="ics_auth">

    <RCode>1</RCode>

    </ApplicationReply>

    <ApplicationReply Name="ics_bill">

    <RCode>1</RCode>

    </ApplicationReply>

    </ApplicationReplies>

    </Request>

    <Request RequestID="4791588597036877403069" MerchantReferenceNumber="ECM0042359294">

    <LineItems>

    <LineItem Number="0">

    <Quantity>1</Quantity>

    <UnitPrice>44.99</UnitPrice>

    </LineItem>

    <LineItem Number="1">

    <Quantity>1</Quantity>

    <UnitPrice>9.99</UnitPrice>

    </LineItem>

    </LineItems>

    <ApplicationReplies>

    <ApplicationReply Name="ics_auth">

    <RCode>1</RCode>

    </ApplicationReply>

    <ApplicationReply Name="ics_pay_subscription_create">

    <RCode>1</RCode>

    </ApplicationReply>

    </ApplicationReplies>

    </Request>

    </Requests>

    </Report>'

    SELECT

    REQUEST.DATA.value('(//@MerchantID)[1]' ,'INT' ) AS MerchantID

    ,REQUEST.DATA.value('@MerchantReferenceNumber' ,'VARCHAR(100)' ) AS MerchantReferenceNumber

    --,-1 + dense_rank() over (partition by request.data order by lineitem.data) AS Number /*-1 is to make it 0-based */

    ,LINEITEM.DATA.value('@Number' ,'INT' ) AS Number

    ,LINEITEM.DATA.value('(Quantity/text())[1]' ,'INT' ) AS Quantity

    ,LINEITEM.DATA.value('(UnitPrice/text())[1]' ,'DECIMAL(18,2)') AS UnitPrice

    ,APPLICATIONREPLY.DATA.value('@Name' ,'VARCHAR(100)' ) AS ApplicationReply_Name

    FROM @XML_DATA.nodes('/Report[1]/Requests[1]/Request') REQUEST(DATA)

    CROSS APPLY REQUEST.DATA.nodes('LineItems/LineItem') LINEITEM(DATA)

    CROSS APPLY REQUEST.DATA.nodes('ApplicationReplies/ApplicationReply') APPLICATIONREPLY(DATA);

    Observing that each request had it's LineItems numbered from 0 up to the number of LineItems -1, I was also experimenting with not having to explicity specify each LineItem's Number attribute but using it's location within the LineItems-node instead, using dense_ranke(). But I then realised that @Number is to you probably not just the sequence number of the item within the request but some reference number. I left the code in the comment, because I'd already put the effort in, and thought you maybe liked my idea 🙂



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply