Unable to link data from data pulled from XML

  • Hi all,

    I'm pulling data from XML into tables, but I'm unsure how to link the data after it's imported. This example has names and tasks, and I can pull the data into two tables, but I can't find any way to link the task to the appropriate person. My person and task tables populate without issue, but there's nothing I can find to link the rows together. So in this example Test 1 would go to the first two Tasks and Test 2 would go to the second two work items.

    Thanks for any suggestions.

    DECLARE @XML TABLE (XMLData XML);

    DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50));

    DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50));

    INSERT INTO @XML SELECT '

    <process>

    <header>

    <Person><Name>Test1</Name><Address>123 main street</Address></Person>

    <Tasks>

    <task><Name>Do some work</Name><details>details</details></task>

    <task><Name>Do some more work</Name><details>details more</details></task>

    </Tasks>

    </header>

    <header>

    <Person><Name>Test2</Name><Address>234 main street</Address></Person>

    <Tasks>

    <task><Name>Do some work 2</Name><details>details 2 </details></task>

    <task><Name>Do some more work3</Name><details>details more 3</details></task>

    </Tasks>

    </header>

    </process>'

    SELECTCONVERT(NVARCHAR(255),c.query('data(Name)')) AS name,

    CONVERT(NVARCHAR(255),c.query('data(Address)')) AS address

    FROM@XML r

    CROSS APPLY XMLData.nodes('process/header/Person') x(c)

    SELECTCONVERT(NVARCHAR(255),c.query('data(Name)')) AS name,

    CONVERT(NVARCHAR(255),c.query('data(details)')) AS address

    FROM@XML r

    CROSS APPLY XMLData.nodes('process/header/Tasks/task') x(c)

  • Instead of trying to shred them separately and then put them together, shred them together and separate them.

    DECLARE @XML TABLE (XMLData XML);

    DECLARE @Person_Tasks TABLE( person_name NVARCHAR(50), person_address NVARCHAR(50), task_name NVARCHAR(50), task_details NVARCHAR(50))

    DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50));

    DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50));

    INSERT INTO @XML SELECT '

    <process>

    <header>

    <Person><Name>Test1</Name><Address>123 main street</Address></Person>

    <Tasks>

    <task><Name>Do some work</Name><details>details</details></task>

    <task><Name>Do some more work</Name><details>details more</details></task>

    </Tasks>

    </header>

    <header>

    <Person><Name>Test2</Name><Address>234 main street</Address></Person>

    <Tasks>

    <task><Name>Do some work 2</Name><details>details 2 </details></task>

    <task><Name>Do some more work3</Name><details>details more 3</details></task>

    </Tasks>

    </header>

    </process>'

    INSERT @Person_Tasks(person_name, person_address, task_name, task_details)

    SELECT p.c.value('Name[1]', 'NVARCHAR(50)') AS person_name

    ,p.c.value('Address[1]', 'NVARCHAR(50)') AS person_address

    ,t.c.value('Name[1]', 'NVARCHAR(50)') AS task_name

    ,t.c.value('details[1]', 'NVARCHAR(50)') AS task_details

    FROM @XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c)

    INSERT @Person( Name, Addresss )

    SELECT DISTINCT person_name, person_address

    FROM @Person_Tasks

    INSERT @Task( Name, Details )

    SELECT DISTINCT task_name, task_details

    FROM @Person_Tasks

    SELECT *

    FROM @Person_Tasks

    SELECT *

    FROM @Person

    SELECT *

    FROM @Task

    It is possible to put them together once they've been shredded separately, but it's much more complicated. Also, you might try using the value function instead of the query function if you want to pull values out of xml data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

    DECLARE @XML TABLE (XMLData XML);

    DECLARE @Person_Tasks TABLE( person_name NVARCHAR(50), person_address NVARCHAR(50), task_name NVARCHAR(50), task_details NVARCHAR(50))

    DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50));

    DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50));

    INSERT INTO @XML SELECT '

    <process>

    <header>

    <Person><Name>Test1</Name><Address>123 main street</Address></Person>

    <Tasks>

    <task><Name>Do some work</Name><details>details</details></task>

    <task><Name>Do some more work</Name><details>details more</details></task>

    </Tasks>

    </header>

    <header>

    <Person><Name>Test2</Name><Address>234 main street</Address></Person>

    <Tasks>

    <task><Name>Do some work 2</Name><details>details 2 </details></task>

    <task><Name>Do some more work3</Name><details>details more 3</details></task>

    </Tasks>

    </header>

    </process>'

    -- Not specifying the text() node

    SELECT p.c.value('Name[1]', 'NVARCHAR(50)') AS person_name

    ,p.c.value('Address[1]', 'NVARCHAR(50)') AS person_address

    ,t.c.value('Name[1]', 'NVARCHAR(50)') AS task_name

    ,t.c.value('details[1]', 'NVARCHAR(50)') AS task_details

    FROM @XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c)

    -- specifying the text() node

    SELECT p.c.value('(Name/text())[1]', 'NVARCHAR(50)') AS person_name

    ,p.c.value('(Address/text())[1]', 'NVARCHAR(50)') AS person_address

    ,t.c.value('(Name/text())[1]', 'NVARCHAR(50)') AS task_name

    ,t.c.value('(details/text())[1]', 'NVARCHAR(50)') AS task_details

    FROM @XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c)

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (8/13/2015)


    Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

    When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (8/13/2015)


    Alan.B (8/13/2015)


    Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

    When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

    Drew

    On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

    In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).

    So I put together this little performance test:

    -- let's get sample data

    SET NOCOUNT ON;

    IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML;

    CREATE TABLE #XML (id int primary key, XMLData XML);

    INSERT INTO #XML

    SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    '<process>

    <header>

    <Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

    <Tasks>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task>

    </Tasks>

    </header>

    <header>

    <Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

    <Tasks>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task>

    </Tasks>

    </header>

    </process>'

    FROM sys.all_columns a, sys.all_columns b;

    GO

    -- clear the cache and start fresh...

    DBCC FREEPROCCACHE with no_infomsgs

    DBCC DROPCLEANBUFFERS with no_infomsgs

    GO

    PRINT '=== Not specifying the text() node ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = p.c.value('Name[1]', 'NVARCHAR(50)'),

    @x2 = p.c.value('Address[1]', 'NVARCHAR(50)'),

    @x3 = t.c.value('Name[1]', 'NVARCHAR(50)'),

    @x4 = t.c.value('details[1]', 'NVARCHAR(50)')

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== specifying the text() node ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

    @x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'),

    @x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

    @x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)')

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== Using the query() method ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)),

    @x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)),

    @x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)),

    @x4 = CAST(t.c.query('data(details)')AS nvarchar(50))

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    Though the optimizer said that the value() method with text() would be fastest and the value() and query() method would be tied for second the results of the test told a different story...

    Beginning execution loop

    === Not specifying the text() node ===

    810

    === Not specifying the text() node ===

    663

    === Not specifying the text() node ===

    646

    Batch execution completed 3 times.

    Beginning execution loop

    === specifying the text() node ===

    1603

    === specifying the text() node ===

    1616

    === specifying the text() node ===

    1590

    Batch execution completed 3 times.

    Beginning execution loop

    === Using the query() method ===

    3166

    === Using the query() method ===

    3263

    === Using the query() method ===

    3330

    Batch execution completed 3 times.

    Interestingly, the text() node specification slows the query down (this is not always the case but was the case in this test). As expected, the query method() was way slower. Unfortunately for the query method it was not getting a parallel query plan so, to level the playing field, I added Adam Machanic's make_parallel() function to the mix and re-ran that query() method with a parallel plan.

    Results:

    Beginning execution loop

    === Using the query() method with make_parallel() ===

    1150

    === Using the query() method with make_parallel() ===

    1110

    === Using the query() method with make_parallel() ===

    1110

    Batch execution completed 3 times.

    Better but still slower than value() method without specifying the text() node. Moral of the story: don't always trust the query optimizer estimations.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (8/13/2015)


    drew.allen (8/13/2015)


    Alan.B (8/13/2015)


    Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

    When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

    Drew

    On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

    In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).

    So I put together this little performance test:

    Both solutions are sub-optimal as they imply work (cost) which more than offsets any gains from the text() function. The culprit is the traversing nodes for the Tasks/task, it's simply a killer!

    😎

    Using Alan's fine test harness to proof this, observe that without the traversing (../) the query is almost 3 times faster than any of the others.

    -- let's get sample data

    SET NOCOUNT ON;

    IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML;

    CREATE TABLE #XML (id int primary key, XMLData XML);

    INSERT INTO #XML

    SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    '<process>

    <header>

    <Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

    <Tasks>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task>

    </Tasks>

    </header>

    <header>

    <Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

    <Tasks>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task>

    </Tasks>

    </header>

    </process>'

    FROM sys.all_columns a, sys.all_columns b;

    GO

    -- clear the cache and start fresh...

    DBCC FREEPROCCACHE with no_infomsgs

    DBCC DROPCLEANBUFFERS with no_infomsgs

    GO

    PRINT '=== (EE) No traversing and text() ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name

    ,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address

    ,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name

    ,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details

    FROM #XML XD

    CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA)

    CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== Not specifying the text() node ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = p.c.value('Name[1]', 'NVARCHAR(50)'),

    @x2 = p.c.value('Address[1]', 'NVARCHAR(50)'),

    @x3 = t.c.value('Name[1]', 'NVARCHAR(50)'),

    @x4 = t.c.value('details[1]', 'NVARCHAR(50)')

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== specifying the text() node ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

    @x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'),

    @x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

    @x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)')

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== Using the query() method ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)),

    @x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)),

    @x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)),

    @x4 = CAST(t.c.query('data(details)')AS nvarchar(50))

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== No traversing and text() ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name

    ,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address

    ,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name

    ,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details

    FROM #XML XD

    CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA)

    CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    Results

    Beginning execution loop

    === (EE) No traversing and text() ===

    396

    === (EE) No traversing and text() ===

    313

    === (EE) No traversing and text() ===

    310

    Batch execution completed 3 times.

    Beginning execution loop

    === Not specifying the text() node ===

    796

    === Not specifying the text() node ===

    830

    === Not specifying the text() node ===

    856

    Batch execution completed 3 times.

    Beginning execution loop

    === specifying the text() node ===

    1993

    === specifying the text() node ===

    1970

    === specifying the text() node ===

    1926

    Batch execution completed 3 times.

    Beginning execution loop

    === Using the query() method ===

    4170

    === Using the query() method ===

    4183

    === Using the query() method ===

    4153

    Batch execution completed 3 times.

    Beginning execution loop

    === No traversing and text() ===

    316

    === No traversing and text() ===

    313

    === No traversing and text() ===

    313

    Batch execution completed 3 times.

  • Alan.B (8/13/2015)


    drew.allen (8/13/2015)


    Alan.B (8/13/2015)


    Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

    When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

    Drew

    On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

    You misunderstood. The original query ran FASTER (48/52) not slower (52/48) than the query with text() on my system.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (8/14/2015)


    Alan.B (8/13/2015)


    drew.allen (8/13/2015)


    Alan.B (8/13/2015)


    Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

    When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

    Drew

    On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

    You misunderstood. The original query ran FASTER (48/52) not slower (52/48) than the query with text() on my system.

    Drew

    The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().

    😎

    === (EE) No traversing and text() === 423

    === (EE) No traversing and text() === 310

    === (EE) No traversing and text() === 313

    --------------------------------------------

    === (EE) No traversing === 570

    === (EE) No traversing === 570

    === (EE) No traversing === 613

  • Eirikur Eiriksson (8/14/2015)


    The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().

    First, I was comparing the two traversing methods, so posting results from the non-traversing method isn't really relevant.

    Second, I'm saying that I'm getting different results with my setup. I don't care how many times you post results from YOUR setup, it isn't going to change what I am getting from MY setup.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (8/14/2015)


    Eirikur Eiriksson (8/14/2015)


    The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().

    First, I was comparing the two traversing methods, so posting results from the non-traversing method isn't really relevant.

    Second, I'm saying that I'm getting different results with my setup. I don't care how many times you post results from YOUR setup, it isn't going to change what I am getting from MY setup.

    Drew

    I find these comments slightly amusing, are you having a bad day?

    😎

    Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

  • Easy now, folks. These types of things are how we all learn. 😉 Well, except for me. I reject all XML whenever I can. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Eirikur Eiriksson (8/14/2015)


    Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

    And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (8/14/2015)


    Eirikur Eiriksson (8/14/2015)


    Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

    And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.

    Drew

    Might I ask you to elaborate further on the complexity and the role of the text() function within XQuery that you mentioned? This sounds really interesting and I love any opportunity to learn something new!

    😎

  • Eirikur Eiriksson (8/14/2015)


    drew.allen (8/14/2015)


    Eirikur Eiriksson (8/14/2015)


    Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.

    And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.

    Drew

    Might I ask you to elaborate further on the complexity and the role of the text() function within XQuery that you mentioned? This sounds really interesting and I love any opportunity to learn something new!

    😎

    Unfortunately, I don't know the details of that complexity. The reason I posted my results in the first place is that this explanation didn't account for my data, and I was hoping that someone would be able to explain why.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Eirikur Eiriksson (8/14/2015)


    Alan.B (8/13/2015)


    drew.allen (8/13/2015)


    Alan.B (8/13/2015)


    Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.

    When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.

    Drew

    On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.

    In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).

    So I put together this little performance test:

    Both solutions are sub-optimal as they imply work (cost) which more than offsets any gains from the text() function. The culprit is the traversing nodes for the Tasks/task, it's simply a killer!

    😎

    Using Alan's fine test harness to proof this, observe that without the traversing (../) the query is almost 3 times faster than any of the others.

    -- let's get sample data

    SET NOCOUNT ON;

    IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML;

    CREATE TABLE #XML (id int primary key, XMLData XML);

    INSERT INTO #XML

    SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    '<process>

    <header>

    <Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

    <Tasks>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task>

    </Tasks>

    </header>

    <header>

    <Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>

    <Tasks>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task>

    <task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task>

    </Tasks>

    </header>

    </process>'

    FROM sys.all_columns a, sys.all_columns b;

    GO

    -- clear the cache and start fresh...

    DBCC FREEPROCCACHE with no_infomsgs

    DBCC DROPCLEANBUFFERS with no_infomsgs

    GO

    PRINT '=== (EE) No traversing and text() ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name

    ,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address

    ,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name

    ,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details

    FROM #XML XD

    CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA)

    CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== Not specifying the text() node ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = p.c.value('Name[1]', 'NVARCHAR(50)'),

    @x2 = p.c.value('Address[1]', 'NVARCHAR(50)'),

    @x3 = t.c.value('Name[1]', 'NVARCHAR(50)'),

    @x4 = t.c.value('details[1]', 'NVARCHAR(50)')

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== specifying the text() node ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

    @x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'),

    @x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'),

    @x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)')

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== Using the query() method ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)),

    @x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)),

    @x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)),

    @x4 = CAST(t.c.query('data(details)')AS nvarchar(50))

    FROM #XML

    CROSS APPLY XMLData.nodes('process/header/Person') p(c)

    CROSS APPLY c.nodes('../Tasks/task') t(c);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    PRINT '=== No traversing and text() ==='

    DECLARE

    @starttime datetime = getdate(),

    @x1 nvarchar(50),

    @x2 nvarchar(50),

    @x3 nvarchar(50),

    @x4 nvarchar(50);

    SELECT

    @x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name

    ,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address

    ,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name

    ,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details

    FROM #XML XD

    CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA)

    CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA);

    PRINT DATEDIFF(MS,@starttime, getdate());

    GO 3

    Results

    Beginning execution loop

    === (EE) No traversing and text() ===

    396

    === (EE) No traversing and text() ===

    313

    === (EE) No traversing and text() ===

    310

    Batch execution completed 3 times.

    Beginning execution loop

    === Not specifying the text() node ===

    796

    === Not specifying the text() node ===

    830

    === Not specifying the text() node ===

    856

    Batch execution completed 3 times.

    Beginning execution loop

    === specifying the text() node ===

    1993

    === specifying the text() node ===

    1970

    === specifying the text() node ===

    1926

    Batch execution completed 3 times.

    Beginning execution loop

    === Using the query() method ===

    4170

    === Using the query() method ===

    4183

    === Using the query() method ===

    4153

    Batch execution completed 3 times.

    Beginning execution loop

    === No traversing and text() ===

    316

    === No traversing and text() ===

    313

    === No traversing and text() ===

    313

    Batch execution completed 3 times.

    Well done sir. I knew there was a better way to parse those nodes but got lazy. Something was not right. I learned a few things in this thread. 😀

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply