August 13, 2015 at 11:12 am
Hi all,
I'm pulling data from XML into tables, but I'm unsure how to link the data after it's imported. This example has names and tasks, and I can pull the data into two tables, but I can't find any way to link the task to the appropriate person. My person and task tables populate without issue, but there's nothing I can find to link the rows together. So in this example Test 1 would go to the first two Tasks and Test 2 would go to the second two work items.
Thanks for any suggestions.
DECLARE @XML TABLE (XMLData XML);
DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50));
DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50));
INSERT INTO @XML SELECT '
<process>
<header>
<Person><Name>Test1</Name><Address>123 main street</Address></Person>
<Tasks>
<task><Name>Do some work</Name><details>details</details></task>
<task><Name>Do some more work</Name><details>details more</details></task>
</Tasks>
</header>
<header>
<Person><Name>Test2</Name><Address>234 main street</Address></Person>
<Tasks>
<task><Name>Do some work 2</Name><details>details 2 </details></task>
<task><Name>Do some more work3</Name><details>details more 3</details></task>
</Tasks>
</header>
</process>'
SELECTCONVERT(NVARCHAR(255),c.query('data(Name)')) AS name,
CONVERT(NVARCHAR(255),c.query('data(Address)')) AS address
FROM@XML r
CROSS APPLY XMLData.nodes('process/header/Person') x(c)
SELECTCONVERT(NVARCHAR(255),c.query('data(Name)')) AS name,
CONVERT(NVARCHAR(255),c.query('data(details)')) AS address
FROM@XML r
CROSS APPLY XMLData.nodes('process/header/Tasks/task') x(c)
August 13, 2015 at 12:53 pm
Instead of trying to shred them separately and then put them together, shred them together and separate them.
DECLARE @XML TABLE (XMLData XML);
DECLARE @Person_Tasks TABLE( person_name NVARCHAR(50), person_address NVARCHAR(50), task_name NVARCHAR(50), task_details NVARCHAR(50))
DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50));
DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50));
INSERT INTO @XML SELECT '
<process>
<header>
<Person><Name>Test1</Name><Address>123 main street</Address></Person>
<Tasks>
<task><Name>Do some work</Name><details>details</details></task>
<task><Name>Do some more work</Name><details>details more</details></task>
</Tasks>
</header>
<header>
<Person><Name>Test2</Name><Address>234 main street</Address></Person>
<Tasks>
<task><Name>Do some work 2</Name><details>details 2 </details></task>
<task><Name>Do some more work3</Name><details>details more 3</details></task>
</Tasks>
</header>
</process>'
INSERT @Person_Tasks(person_name, person_address, task_name, task_details)
SELECT p.c.value('Name[1]', 'NVARCHAR(50)') AS person_name
,p.c.value('Address[1]', 'NVARCHAR(50)') AS person_address
,t.c.value('Name[1]', 'NVARCHAR(50)') AS task_name
,t.c.value('details[1]', 'NVARCHAR(50)') AS task_details
FROM @XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c)
INSERT @Person( Name, Addresss )
SELECT DISTINCT person_name, person_address
FROM @Person_Tasks
INSERT @Task( Name, Details )
SELECT DISTINCT task_name, task_details
FROM @Person_Tasks
SELECT *
FROM @Person_Tasks
SELECT *
FROM @Person
SELECT *
FROM @Task
It is possible to put them together once they've been shredded separately, but it's much more complicated. Also, you might try using the value function instead of the query function if you want to pull values out of xml data.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 13, 2015 at 1:41 pm
Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.
DECLARE @XML TABLE (XMLData XML);
DECLARE @Person_Tasks TABLE( person_name NVARCHAR(50), person_address NVARCHAR(50), task_name NVARCHAR(50), task_details NVARCHAR(50))
DECLARE @Person Table (Name NVARCHAR(50), Addresss NVARCHAR(50));
DECLARE @Task Table (Name NVARCHAR(50), Details NVARCHAR(50));
INSERT INTO @XML SELECT '
<process>
<header>
<Person><Name>Test1</Name><Address>123 main street</Address></Person>
<Tasks>
<task><Name>Do some work</Name><details>details</details></task>
<task><Name>Do some more work</Name><details>details more</details></task>
</Tasks>
</header>
<header>
<Person><Name>Test2</Name><Address>234 main street</Address></Person>
<Tasks>
<task><Name>Do some work 2</Name><details>details 2 </details></task>
<task><Name>Do some more work3</Name><details>details more 3</details></task>
</Tasks>
</header>
</process>'
-- Not specifying the text() node
SELECT p.c.value('Name[1]', 'NVARCHAR(50)') AS person_name
,p.c.value('Address[1]', 'NVARCHAR(50)') AS person_address
,t.c.value('Name[1]', 'NVARCHAR(50)') AS task_name
,t.c.value('details[1]', 'NVARCHAR(50)') AS task_details
FROM @XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c)
-- specifying the text() node
SELECT p.c.value('(Name/text())[1]', 'NVARCHAR(50)') AS person_name
,p.c.value('(Address/text())[1]', 'NVARCHAR(50)') AS person_address
,t.c.value('(Name/text())[1]', 'NVARCHAR(50)') AS task_name
,t.c.value('(details/text())[1]', 'NVARCHAR(50)') AS task_details
FROM @XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c)
-- Itzik Ben-Gan 2001
August 13, 2015 at 3:17 pm
Alan.B (8/13/2015)
Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.
When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 13, 2015 at 9:05 pm
drew.allen (8/13/2015)
Alan.B (8/13/2015)
Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.
Drew
On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.
In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).
So I put together this little performance test:
-- let's get sample data
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML;
CREATE TABLE #XML (id int primary key, XMLData XML);
INSERT INTO #XML
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
'<process>
<header>
<Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>
<Tasks>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task>
</Tasks>
</header>
<header>
<Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>
<Tasks>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task>
</Tasks>
</header>
</process>'
FROM sys.all_columns a, sys.all_columns b;
GO
-- clear the cache and start fresh...
DBCC FREEPROCCACHE with no_infomsgs
DBCC DROPCLEANBUFFERS with no_infomsgs
GO
PRINT '=== Not specifying the text() node ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = p.c.value('Name[1]', 'NVARCHAR(50)'),
@x2 = p.c.value('Address[1]', 'NVARCHAR(50)'),
@x3 = t.c.value('Name[1]', 'NVARCHAR(50)'),
@x4 = t.c.value('details[1]', 'NVARCHAR(50)')
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== specifying the text() node ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'),
@x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'),
@x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'),
@x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)')
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== Using the query() method ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)),
@x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)),
@x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)),
@x4 = CAST(t.c.query('data(details)')AS nvarchar(50))
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
Though the optimizer said that the value() method with text() would be fastest and the value() and query() method would be tied for second the results of the test told a different story...
Beginning execution loop
=== Not specifying the text() node ===
810
=== Not specifying the text() node ===
663
=== Not specifying the text() node ===
646
Batch execution completed 3 times.
Beginning execution loop
=== specifying the text() node ===
1603
=== specifying the text() node ===
1616
=== specifying the text() node ===
1590
Batch execution completed 3 times.
Beginning execution loop
=== Using the query() method ===
3166
=== Using the query() method ===
3263
=== Using the query() method ===
3330
Batch execution completed 3 times.
Interestingly, the text() node specification slows the query down (this is not always the case but was the case in this test). As expected, the query method() was way slower. Unfortunately for the query method it was not getting a parallel query plan so, to level the playing field, I added Adam Machanic's make_parallel() function to the mix and re-ran that query() method with a parallel plan.
Results:
Beginning execution loop
=== Using the query() method with make_parallel() ===
1150
=== Using the query() method with make_parallel() ===
1110
=== Using the query() method with make_parallel() ===
1110
Batch execution completed 3 times.
Better but still slower than value() method without specifying the text() node. Moral of the story: don't always trust the query optimizer estimations.
-- Itzik Ben-Gan 2001
August 14, 2015 at 2:15 am
Alan.B (8/13/2015)
drew.allen (8/13/2015)
Alan.B (8/13/2015)
Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.
Drew
On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.
In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).
So I put together this little performance test:
Both solutions are sub-optimal as they imply work (cost) which more than offsets any gains from the text() function. The culprit is the traversing nodes for the Tasks/task, it's simply a killer!
😎
Using Alan's fine test harness to proof this, observe that without the traversing (../) the query is almost 3 times faster than any of the others.
-- let's get sample data
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML;
CREATE TABLE #XML (id int primary key, XMLData XML);
INSERT INTO #XML
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
'<process>
<header>
<Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>
<Tasks>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task>
</Tasks>
</header>
<header>
<Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>
<Tasks>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task>
</Tasks>
</header>
</process>'
FROM sys.all_columns a, sys.all_columns b;
GO
-- clear the cache and start fresh...
DBCC FREEPROCCACHE with no_infomsgs
DBCC DROPCLEANBUFFERS with no_infomsgs
GO
PRINT '=== (EE) No traversing and text() ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name
,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address
,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name
,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details
FROM #XML XD
CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA)
CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== Not specifying the text() node ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = p.c.value('Name[1]', 'NVARCHAR(50)'),
@x2 = p.c.value('Address[1]', 'NVARCHAR(50)'),
@x3 = t.c.value('Name[1]', 'NVARCHAR(50)'),
@x4 = t.c.value('details[1]', 'NVARCHAR(50)')
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== specifying the text() node ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'),
@x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'),
@x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'),
@x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)')
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== Using the query() method ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)),
@x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)),
@x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)),
@x4 = CAST(t.c.query('data(details)')AS nvarchar(50))
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== No traversing and text() ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name
,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address
,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name
,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details
FROM #XML XD
CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA)
CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
Results
Beginning execution loop
=== (EE) No traversing and text() ===
396
=== (EE) No traversing and text() ===
313
=== (EE) No traversing and text() ===
310
Batch execution completed 3 times.
Beginning execution loop
=== Not specifying the text() node ===
796
=== Not specifying the text() node ===
830
=== Not specifying the text() node ===
856
Batch execution completed 3 times.
Beginning execution loop
=== specifying the text() node ===
1993
=== specifying the text() node ===
1970
=== specifying the text() node ===
1926
Batch execution completed 3 times.
Beginning execution loop
=== Using the query() method ===
4170
=== Using the query() method ===
4183
=== Using the query() method ===
4153
Batch execution completed 3 times.
Beginning execution loop
=== No traversing and text() ===
316
=== No traversing and text() ===
313
=== No traversing and text() ===
313
Batch execution completed 3 times.
August 14, 2015 at 7:22 am
Alan.B (8/13/2015)
drew.allen (8/13/2015)
Alan.B (8/13/2015)
Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.
Drew
On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.
You misunderstood. The original query ran FASTER (48/52) not slower (52/48) than the query with text() on my system.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 14, 2015 at 8:06 am
drew.allen (8/14/2015)
Alan.B (8/13/2015)
drew.allen (8/13/2015)
Alan.B (8/13/2015)
Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.
Drew
On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.
You misunderstood. The original query ran FASTER (48/52) not slower (52/48) than the query with text() on my system.
Drew
The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().
😎
=== (EE) No traversing and text() === 423
=== (EE) No traversing and text() === 310
=== (EE) No traversing and text() === 313
--------------------------------------------
=== (EE) No traversing === 570
=== (EE) No traversing === 570
=== (EE) No traversing === 613
August 14, 2015 at 11:21 am
Eirikur Eiriksson (8/14/2015)
The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().
First, I was comparing the two traversing methods, so posting results from the non-traversing method isn't really relevant.
Second, I'm saying that I'm getting different results with my setup. I don't care how many times you post results from YOUR setup, it isn't going to change what I am getting from MY setup.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 14, 2015 at 11:55 am
drew.allen (8/14/2015)
Eirikur Eiriksson (8/14/2015)
The text() is faster than without it, here are results from two versions of non-traversing XQueries from my previous post, one with and on without the text().First, I was comparing the two traversing methods, so posting results from the non-traversing method isn't really relevant.
Second, I'm saying that I'm getting different results with my setup. I don't care how many times you post results from YOUR setup, it isn't going to change what I am getting from MY setup.
Drew
I find these comments slightly amusing, are you having a bad day?
😎
Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.
August 14, 2015 at 1:58 pm
Easy now, folks. These types of things are how we all learn. 😉 Well, except for me. I reject all XML whenever I can. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
August 14, 2015 at 2:41 pm
Eirikur Eiriksson (8/14/2015)
Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.
And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 14, 2015 at 3:58 pm
drew.allen (8/14/2015)
Eirikur Eiriksson (8/14/2015)
Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.
Drew
Might I ask you to elaborate further on the complexity and the role of the text() function within XQuery that you mentioned? This sounds really interesting and I love any opportunity to learn something new!
😎
August 17, 2015 at 7:13 am
Eirikur Eiriksson (8/14/2015)
drew.allen (8/14/2015)
Eirikur Eiriksson (8/14/2015)
Further, I was only stating that with a properly written XQuery, using the text() function is much quicker, I was not commenting or comparing to yours or anyone else's solution.And all I'm stating is that I have two pieces of code that are identical except for the use of text(), and the code without text() runs faster, so your assertion is flawed. The role of text() in xquery performance is much more complicated than implied by a blanket statement that the text() function is much quicker.
Drew
Might I ask you to elaborate further on the complexity and the role of the text() function within XQuery that you mentioned? This sounds really interesting and I love any opportunity to learn something new!
😎
Unfortunately, I don't know the details of that complexity. The reason I posted my results in the first place is that this explanation didn't account for my data, and I was hoping that someone would be able to explain why.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
August 17, 2015 at 10:19 pm
Eirikur Eiriksson (8/14/2015)
Alan.B (8/13/2015)
drew.allen (8/13/2015)
Alan.B (8/13/2015)
Just to add to Drew's excellent solution, not only would you want to use the value() method over the query method here, you should include the text() node in your XPATH expression. This will improve your performance by telling the query optimizer that you only need the text. Run the query below in SSMS with "Include Actual Execution Plan" turned on to see what I mean.When I ran this, the original query took 48% of the batch and the query with the text() node took 52%. That doesn't sound like a savings to me.
Drew
On my system it was 55/45 (~20% improvement). 52/48 is still a 7% improvement; not a bad deal doing something that takes less than a minute to achieve a 7% performance improvement.
In other news if you compare the query() method to the value() method the optimizer tells you that you get the same performance for each :-P. I know from experience that, for something like this, the value() method is much faster (see attached query plan).
So I put together this little performance test:
Both solutions are sub-optimal as they imply work (cost) which more than offsets any gains from the text() function. The culprit is the traversing nodes for the Tasks/task, it's simply a killer!
😎
Using Alan's fine test harness to proof this, observe that without the traversing (../) the query is almost 3 times faster than any of the others.
-- let's get sample data
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..#XML') IS NOT NULL DROP TABLE #XML;
CREATE TABLE #XML (id int primary key, XMLData XML);
INSERT INTO #XML
SELECT TOP 10000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
'<process>
<header>
<Person><Name>Test1</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>
<Tasks>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details</details></task>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more</details></task>
</Tasks>
</header>
<header>
<Person><Name>Test2</Name><Address>'+CAST(newid() AS varchar(36))+'</Address></Person>
<Tasks>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details 2 </details></task>
<task><Name>'+CAST(newid() AS varchar(36))+'</Name><details>details more 3</details></task>
</Tasks>
</header>
</process>'
FROM sys.all_columns a, sys.all_columns b;
GO
-- clear the cache and start fresh...
DBCC FREEPROCCACHE with no_infomsgs
DBCC DROPCLEANBUFFERS with no_infomsgs
GO
PRINT '=== (EE) No traversing and text() ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name
,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address
,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name
,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details
FROM #XML XD
CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA)
CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== Not specifying the text() node ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = p.c.value('Name[1]', 'NVARCHAR(50)'),
@x2 = p.c.value('Address[1]', 'NVARCHAR(50)'),
@x3 = t.c.value('Name[1]', 'NVARCHAR(50)'),
@x4 = t.c.value('details[1]', 'NVARCHAR(50)')
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== specifying the text() node ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = p.c.value('(Name/text())[1]', 'NVARCHAR(50)'),
@x2 = p.c.value('(Address/text())[1]', 'NVARCHAR(50)'),
@x3 = t.c.value('(Name/text())[1]', 'NVARCHAR(50)'),
@x4 = t.c.value('(details/text())[1]', 'NVARCHAR(50)')
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== Using the query() method ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = CAST(p.c.query('data(Name)') AS nvarchar(50)),
@x2 = CAST(p.c.query('data(Address)')AS nvarchar(50)),
@x3 = CAST(t.c.query('data(Name)')AS nvarchar(50)),
@x4 = CAST(t.c.query('data(details)')AS nvarchar(50))
FROM #XML
CROSS APPLY XMLData.nodes('process/header/Person') p(c)
CROSS APPLY c.nodes('../Tasks/task') t(c);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
PRINT '=== No traversing and text() ==='
DECLARE
@starttime datetime = getdate(),
@x1 nvarchar(50),
@x2 nvarchar(50),
@x3 nvarchar(50),
@x4 nvarchar(50);
SELECT
@x1 = HEADER.DATA.value('(Person/Name/text())[1]' ,'NVARCHAR(50)') --AS Person_Name
,@x2 = HEADER.DATA.value('(Person/Address/text())[1]' ,'NVARCHAR(50)') --AS Person_Address
,@x3 = TASK.DATA.value('(Name/text())[1]' ,'NVARCHAR(50)') --AS Task_Name
,@x4 = TASK.DATA.value('(details/text())[1]' ,'NVARCHAR(50)') --AS Task_details
FROM #XML XD
CROSS APPLY XD.XMLData.nodes('process/header') AS HEADER(DATA)
CROSS APPLY HEADER.DATA.nodes('Tasks/task') AS TASK(DATA);
PRINT DATEDIFF(MS,@starttime, getdate());
GO 3
Results
Beginning execution loop
=== (EE) No traversing and text() ===
396
=== (EE) No traversing and text() ===
313
=== (EE) No traversing and text() ===
310
Batch execution completed 3 times.
Beginning execution loop
=== Not specifying the text() node ===
796
=== Not specifying the text() node ===
830
=== Not specifying the text() node ===
856
Batch execution completed 3 times.
Beginning execution loop
=== specifying the text() node ===
1993
=== specifying the text() node ===
1970
=== specifying the text() node ===
1926
Batch execution completed 3 times.
Beginning execution loop
=== Using the query() method ===
4170
=== Using the query() method ===
4183
=== Using the query() method ===
4153
Batch execution completed 3 times.
Beginning execution loop
=== No traversing and text() ===
316
=== No traversing and text() ===
313
=== No traversing and text() ===
313
Batch execution completed 3 times.
Well done sir. I knew there was a better way to parse those nodes but got lazy. Something was not right. I learned a few things in this thread. 😀
-- Itzik Ben-Gan 2001
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply