XML Parse Query

  • RBarryYoung (2/28/2010)


    ...

    What I did notice though is that adding the text function ("(.../text())[1]") really seems to help the optimizer to produce a much more efficient XML query (I actually learned this from you, Paul :-)). Just compare the query plan's for Lutz's with & without:

    SELECT

    U.v.value('(txtApplicantName/text())[1]', 'VARCHAR(30)') as a,

    W.x.value('(txtAcctNo/text())[1]', 'VARCHAR(30)') as b,

    W.x.value('(OptInFlag/text())[1]', 'VARCHAR(30)') as c

    FROM @XML.nodes('xfadata/changeDebitCard') T(c)

    CROSS APPLY c.nodes('applicantInfo') U(v)

    CROSS APPLY c.nodes('posAccts') W(x)

    Now that's what I call "significant improvement"!!!!

    Seems like I need to look deeper into it, meaning to test it against a larger data set. Just to make sure the execution plan doesn't hide anything, like when UDF's are involved (IIRC)...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (2/28/2010)


    RBarryYoung (2/28/2010)


    ...

    What I did notice though is that adding the text function ("(.../text())[1]") really seems to help the optimizer to produce a much more efficient XML query (I actually learned this from you, Paul :-)). Just compare the query plan's for Lutz's with & without:

    SELECT

    U.v.value('(txtApplicantName/text())[1]', 'VARCHAR(30)') as a,

    W.x.value('(txtAcctNo/text())[1]', 'VARCHAR(30)') as b,

    W.x.value('(OptInFlag/text())[1]', 'VARCHAR(30)') as c

    FROM @XML.nodes('xfadata/changeDebitCard') T(c)

    CROSS APPLY c.nodes('applicantInfo') U(v)

    CROSS APPLY c.nodes('posAccts') W(x)

    Now that's what I call "significant improvement"!!!!

    Seems like I need to look deeper into it, meaning to test it against a larger data set. Just to make sure the execution plan doesn't hide anything, like when UDF's are involved (IIRC)...

    The things is, now I am not so sure that the QP estimates can be trusted here. When I test it with a larger dataset, the QP cost estimates are still much lower for the "/text()" version, but it's elapsed time is much higher!? I'm trying to figure this out with Profiler, but it's not really telling me much...?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (2/28/2010)


    lmu92 (2/28/2010)


    RBarryYoung (2/28/2010)


    ...

    What I did notice though is that adding the text function ("(.../text())[1]") really seems to help the optimizer to produce a much more efficient XML query (I actually learned this from you, Paul :-)). Just compare the query plan's for Lutz's with & without:

    SELECT

    U.v.value('(txtApplicantName/text())[1]', 'VARCHAR(30)') as a,

    W.x.value('(txtAcctNo/text())[1]', 'VARCHAR(30)') as b,

    W.x.value('(OptInFlag/text())[1]', 'VARCHAR(30)') as c

    FROM @XML.nodes('xfadata/changeDebitCard') T(c)

    CROSS APPLY c.nodes('applicantInfo') U(v)

    CROSS APPLY c.nodes('posAccts') W(x)

    Now that's what I call "significant improvement"!!!!

    Seems like I need to look deeper into it, meaning to test it against a larger data set. Just to make sure the execution plan doesn't hide anything, like when UDF's are involved (IIRC)...

    The things is, now I am not so sure that the QP estimates can be trusted here. When I test it with a larger dataset, the QP cost estimates are still much lower for the "/text()" version, but it's elapsed time is much higher!? I'm trying to figure this out with Profiler, but it's not really telling me much...?

    Huh! Well, that was weird! Apparently the "Include Actual Execution Plan" setting was causing this very odd cumulative output stall in my session to SSMS, resulting in the last of my four test queries (Lutz's rewritten one) to get hit with a huge elapsed time penalty. Once I took that out, they are reverted to relatively close again.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • lmu92 (2/28/2010)


    Now that's what I call "significant improvement"!!!!

    Seems like I need to look deeper into it, meaning to test it against a larger data set. Just to make sure the execution plan doesn't hide anything, like when UDF's are involved (IIRC)...

    I'm glad you like the effect /text() has on the plan - it is a current favourite of mine. The thing about XML is that it is quite difficult to analyse the plans and establish which is the better one, just by inspection. I happen to like the plan produced with /text() since it is, arguably, neater. I am a fan of neat plans, SQL or XML.

    Personally, I like to explore the different methods, just for the fun of it. Hence the FWLOR version.

    Unless you are processing huge XML data sets, I really don't think it matters too much. If you were processing huge sets of XML data (or very many smaller ones), it might become important. My own view is that the 'best' XML solution is so tightly correlated with the specifics of the performance test, that it makes broad statements about performance unsound at best.

    Until you choose a test, it is impossible to say which method is optimal, and why. Further, it is likely that any given test would invite other optimizations which would often dominate the effect of a given XML syntax.

    Paul

  • Paul White (2/28/2010)


    lmu92 (2/28/2010)


    ...

    Unless you are processing huge XML data sets, I really don't think it matters too much. If you were processing huge sets of XML data (or very many smaller ones), it might become important. My own view is that the 'best' XML solution is so tightly correlated with the specifics of the performance test, that it makes broad statements about performance unsound at best.

    Until you choose a test, it is impossible to say which method is optimal, and why. Further, it is likely that any given test would invite other optimizations which would often dominate the effect of a given XML syntax.

    Paul

    I probably look deeper into this subject and generate larger test volumes to compare against. Setup would include large xml file with many elements per level but only two or three levels deep, lees elements per level but 5 to 10 level, larger number of rather small xml files in a table (each table related scenario with and without xml index). Any thoughts regarding that setup? (addtl. secenario; worth testing yes/no a.s.o.)

    I think we should rely on statistics results (and/or profiler) rather than percentage count from execution plans as long as we deal with a limited sample size (like in the given scenario). Let's see if I can put enough load on the system to make the creation of the execution plan less relevant...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply