November 17, 2016 at 9:54 am
I have some xml that I need to pull data from. Below is a sample of the xml and what I'm trying to do. It's currently returning four records but I need it to return six records. The two missing records are for MerchantReferenceNumber ECM0042359294 LineItem = 1.
Any help would be greatly appreciated.
declare @xml_data xml = '
<Report MerchantID="123456">
<Requests>
<Request RequestID="4791333844346002104017" MerchantReferenceNumber="RA00042357770">
<LineItems>
<LineItem Number="0">
<Quantity>1</Quantity>
<UnitPrice>59.99</UnitPrice>
</LineItem>
</LineItems>
<ApplicationReplies>
<ApplicationReply Name="ics_auth">
<RCode>1</RCode>
</ApplicationReply>
<ApplicationReply Name="ics_bill">
<RCode>1</RCode>
</ApplicationReply>
</ApplicationReplies>
</Request>
<Request RequestID="4791588597036877403069" MerchantReferenceNumber="ECM0042359294">
<LineItems>
<LineItem Number="0">
<Quantity>1</Quantity>
<UnitPrice>44.99</UnitPrice>
</LineItem>
<LineItem Number="1">
<Quantity>1</Quantity>
<UnitPrice>9.99</UnitPrice>
</LineItem>
</LineItems>
<ApplicationReplies>
<ApplicationReply Name="ics_auth">
<RCode>1</RCode>
</ApplicationReply>
<ApplicationReply Name="ics_pay_subscription_create">
<RCode>1</RCode>
</ApplicationReply>
</ApplicationReplies>
</Request>
</Requests>
</Report>'
select t.c.value('../../../../@MerchantID', 'int') as MerchantID,
t.c.value('../../@MerchantReferenceNumber', 'varchar(100)') as MerchantReferenceNumber,
t.c.value('(../../LineItems/LineItem/@Number)[1]', 'int') as LineNumber,
t.c.value('(../../LineItems/LineItem/UnitPrice)[1]', 'decimal(18,2)') as UnitPrice,
t.c.value('@Name', 'varchar(100)') as ApplicationReplyName
from @xml_data.nodes('/Report/Requests/Request/ApplicationReplies/ApplicationReply') t(c)
order by MerchantReferenceNumber, LineNumber
November 17, 2016 at 10:35 am
Does this get you any closer?
WITH ApplicationReplyNames AS
(
SELECT u.c.value('@Name', 'varchar(100)') as ApplicationReplyName,u.c.value('../../@MerchantReferenceNumber', 'varchar(100)') as MerchantReferenceNumber
from @xml_data.nodes('/Report/Requests/Request/ApplicationReplies/ApplicationReply') u(c)
)
select t.c.value('../../../../@MerchantID', 'int') as MerchantID,
t.c.value('../../@MerchantReferenceNumber', 'varchar(100)') as MerchantReferenceNumber,
t.c.value('(@Number)[1]', 'int') as LineNumber,
t.c.value('(UnitPrice)[1]', 'decimal(18,2)') as UnitPrice,
arn.ApplicationReplyName
from @xml_data.nodes('/Report/Requests/Request/LineItems/LineItem') t(c)
JOIN ApplicationReplyNames arn ON t.c.value('../../@MerchantReferenceNumber', 'varchar(100)') = arn.MerchantReferenceNumber
order by MerchantReferenceNumber, LineNumber
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 17, 2016 at 11:23 am
Thanks Phil.
Was hoping to avoid using CTE as the actual xml contains ~100k requests with a couple more nodes that would require same CTE solution to pull data. Are there any other solutions? Would OPENXML be better?
November 17, 2016 at 11:33 am
Here is an alternative solution
😎
BTW, OPENXML is not better, stick to the XQUERY
USE TEEST;
GO
SET NOCOUNT ON;
DECLARE @XML_DATA XML = '
<Report MerchantID="123456">
<Requests>
<Request RequestID="4791333844346002104017" MerchantReferenceNumber="RA00042357770">
<LineItems>
<LineItem Number="0">
<Quantity>1</Quantity>
<UnitPrice>59.99</UnitPrice>
</LineItem>
</LineItems>
<ApplicationReplies>
<ApplicationReply Name="ics_auth">
<RCode>1</RCode>
</ApplicationReply>
<ApplicationReply Name="ics_bill">
<RCode>1</RCode>
</ApplicationReply>
</ApplicationReplies>
</Request>
<Request RequestID="4791588597036877403069" MerchantReferenceNumber="ECM0042359294">
<LineItems>
<LineItem Number="0">
<Quantity>1</Quantity>
<UnitPrice>44.99</UnitPrice>
</LineItem>
<LineItem Number="1">
<Quantity>1</Quantity>
<UnitPrice>9.99</UnitPrice>
</LineItem>
</LineItems>
<ApplicationReplies>
<ApplicationReply Name="ics_auth">
<RCode>1</RCode>
</ApplicationReply>
<ApplicationReply Name="ics_pay_subscription_create">
<RCode>1</RCode>
</ApplicationReply>
</ApplicationReplies>
</Request>
</Requests>
</Report>'
SELECT
REQUEST.DATA.value('(//@MerchantID)[1]' ,'INT' ) AS MerchantID
,REQUEST.DATA.value('@MerchantReferenceNumber' ,'VARCHAR(100)' ) AS MerchantReferenceNumber
,LINEITEM.DATA.value('@Number' ,'INT' ) AS Number
,LINEITEM.DATA.value('(Quantity/text())[1]' ,'INT' ) AS Quantity
,LINEITEM.DATA.value('(UnitPrice/text())[1]' ,'DECIMAL(18,2)') AS UnitPrice
,APPLICATIONREPLY.DATA.value('@Name' ,'VARCHAR(100)' ) AS ApplicationReply_Name
FROM @XML_DATA.nodes('Report/Requests/Request') REQUEST(DATA)
CROSS APPLY REQUEST.DATA.nodes('LineItems/LineItem') LINEITEM(DATA)
CROSS APPLY REQUEST.DATA.nodes('ApplicationReplies/ApplicationReply') APPLICATIONREPLY(DATA);
Output
MerchantID MerchantReferenceNumber Number Quantity UnitPrice ApplicationReply_Name
----------- ------------------------- ----------- ----------- ---------- ----------------------------
123456 RA00042357770 0 1 59.99 ics_auth
123456 RA00042357770 0 1 59.99 ics_bill
123456 ECM0042359294 0 1 44.99 ics_auth
123456 ECM0042359294 1 1 9.99 ics_auth
123456 ECM0042359294 0 1 44.99 ics_pay_subscription_create
123456 ECM0042359294 1 1 9.99 ics_pay_subscription_create
November 17, 2016 at 11:45 am
I compared execution plans ... my version seems significantly less expensive, at least on this tiny amount of data.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 17, 2016 at 12:04 pm
Thank you Eirikur. That was exactly what I looking for.
In terms of performance, Eirikur's solution performed better. It had less reads, most likely because it did not use a CTE. After applying the solution to the actual xml, the performance gains were even more significant (~10 sec vs ~2 min)
Thanks for all the help!
November 17, 2016 at 1:03 pm
Phil Parkin (11/17/2016)
I compared execution plans ... my version seems significantly less expensive, at least on this tiny amount of data.
The execution plan for my version is significantly simpler (41 vs 96 operators) although the cost is set much higher within the plan, main reason being that it does a full optimization on my query but aborts on a timeout on the others, hence the cost figures for the those are fully fledged guesstimates.
😎
Two points abut the queries:
1) Backward traversing up the structure is expensive, should be avoided if possible.
2) Retrieving node values requires an internal reconstruction of the XML unless either the data() or the text() functions are used.
November 17, 2016 at 1:03 pm
bennylu (11/17/2016)
Thank you Eirikur. That was exactly what I looking for.In terms of performance, Eirikur's solution performed better. It had less reads, most likely because it did not use a CTE. After applying the solution to the actual xml, the performance gains were even more significant (~10 sec vs ~2 min)
Thanks for all the help!
You are very welcome.
😎
November 17, 2016 at 1:09 pm
Eirikur Eiriksson (11/17/2016)
Phil Parkin (11/17/2016)
I compared execution plans ... my version seems significantly less expensive, at least on this tiny amount of data.The execution plan for my version is significantly simpler (41 vs 96 operators) although the cost is set much higher within the plan, main reason being that it does a full optimization on my query but aborts on a timeout on the others, hence the cost figures for the those are fully fledged guesstimates.
😎
Two points abut the queries:
1) Backward traversing up the structure is expensive, should be avoided if possible.
2) Retrieving node values requires an internal reconstruction of the XML unless either the data() or the text() functions are used.
Thanks Eirikur.
Next time someone poses an XQuery question, I think I might just keep quiet 🙂
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
November 17, 2016 at 2:48 pm
Phil Parkin (11/17/2016)
Eirikur Eiriksson (11/17/2016)
Phil Parkin (11/17/2016)
I compared execution plans ... my version seems significantly less expensive, at least on this tiny amount of data.The execution plan for my version is significantly simpler (41 vs 96 operators) although the cost is set much higher within the plan, main reason being that it does a full optimization on my query but aborts on a timeout on the others, hence the cost figures for the those are fully fledged guesstimates.
😎
Two points abut the queries:
1) Backward traversing up the structure is expensive, should be avoided if possible.
2) Retrieving node values requires an internal reconstruction of the XML unless either the data() or the text() functions are used.
Thanks Eirikur.
You are very welcome.
😎
Next time someone poses an XQuery question, I think I might just keep quiet 🙂
Please don't, there is always something to be learned;-)
November 18, 2016 at 1:33 pm
Eirikur's solution can be made even faster if you specify in your xquery that you will only ever accept one root report-node per xml document.
DECLARE @XML_DATA XML = '
<Report MerchantID="123456">
<Requests>
<Request RequestID="4791333844346002104017" MerchantReferenceNumber="RA00042357770">
<LineItems>
<LineItem Number="0">
<Quantity>1</Quantity>
<UnitPrice>59.99</UnitPrice>
</LineItem>
</LineItems>
<ApplicationReplies>
<ApplicationReply Name="ics_auth">
<RCode>1</RCode>
</ApplicationReply>
<ApplicationReply Name="ics_bill">
<RCode>1</RCode>
</ApplicationReply>
</ApplicationReplies>
</Request>
<Request RequestID="4791588597036877403069" MerchantReferenceNumber="ECM0042359294">
<LineItems>
<LineItem Number="0">
<Quantity>1</Quantity>
<UnitPrice>44.99</UnitPrice>
</LineItem>
<LineItem Number="1">
<Quantity>1</Quantity>
<UnitPrice>9.99</UnitPrice>
</LineItem>
</LineItems>
<ApplicationReplies>
<ApplicationReply Name="ics_auth">
<RCode>1</RCode>
</ApplicationReply>
<ApplicationReply Name="ics_pay_subscription_create">
<RCode>1</RCode>
</ApplicationReply>
</ApplicationReplies>
</Request>
</Requests>
</Report>'
SELECT
REQUEST.DATA.value('(//@MerchantID)[1]' ,'INT' ) AS MerchantID
,REQUEST.DATA.value('@MerchantReferenceNumber' ,'VARCHAR(100)' ) AS MerchantReferenceNumber
--,-1 + dense_rank() over (partition by request.data order by lineitem.data) AS Number /*-1 is to make it 0-based */
,LINEITEM.DATA.value('@Number' ,'INT' ) AS Number
,LINEITEM.DATA.value('(Quantity/text())[1]' ,'INT' ) AS Quantity
,LINEITEM.DATA.value('(UnitPrice/text())[1]' ,'DECIMAL(18,2)') AS UnitPrice
,APPLICATIONREPLY.DATA.value('@Name' ,'VARCHAR(100)' ) AS ApplicationReply_Name
FROM @XML_DATA.nodes('/Report[1]/Requests[1]/Request') REQUEST(DATA)
CROSS APPLY REQUEST.DATA.nodes('LineItems/LineItem') LINEITEM(DATA)
CROSS APPLY REQUEST.DATA.nodes('ApplicationReplies/ApplicationReply') APPLICATIONREPLY(DATA);
Observing that each request had it's LineItems numbered from 0 up to the number of LineItems -1, I was also experimenting with not having to explicity specify each LineItem's Number attribute but using it's location within the LineItems-node instead, using dense_ranke(). But I then realised that @Number is to you probably not just the sequence number of the item within the request but some reference number. I left the code in the comment, because I'd already put the effort in, and thought you maybe liked my idea 🙂
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply