Getting parent node attributes is very slow

Question

Getting parent node attributes is very slow

buddy__a

Hall of Fame

Points: 3222
More actions
May 7, 2012 at 12:59 pm

#260527

I have an XML file that I need to import and insert data in to a table. The XML looks like below:
<EXPORT>
<UNIT UNIT_ID="40001" TYPE="01">
<RDG DT="2012-04-18T19:00:01" V="0.03375" />
....
</UNIT>
<UNIT UNIT_ID="40002" TYPE="01">
<RDG DT="2012-04-18T12:05:15" V="1.22" />
....
</UNIT>
....
</EXPORT>
There are hundreds of UNITs and thousands of readings per unit. I need to get data from both the unit and reading elements to insert in the table. I can use XQuery to get UNIT element data OR RDG element data very fast (in a few seconds) however when I try to get both in one shot like below the query is unusably slow. It was so slow that I left it to go work on something else and forgot about it and about an hour and a half later it was still running. Is there a faster way?
declare @x xml
select @x = CAST(x1 AS XML)
FROM OPENROWSET(
BULK 'EXPORT_DATA.XML',
SINGLE_BLOB) AS T1(x1)
select r.value('parent::UNIT/@UNIT_ID', 'nvarchar(100)'),
r.value('@DT', 'DATETIME'),
r.value('@V', 'nvarchar(10)') as decimal(10, 8)
from @x.nodes('/EXPORT/UNIT/RDG') X(r)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

HowardW One Orange Chip Points: 29109 More actions · Answer 1

Traversing parents is really slow in XQuery. If you're dealing with one massive document, rather than an XML column across many rows, use OPENXML instead, which will, comparatively, smoke it:

DECLARE @XML XML='<EXPORT>

<UNIT UNIT_ID="40001" TYPE="01">

<RDG DT="2012-04-18T19:00:01" V="0.03375" />

....

</UNIT>

<UNIT UNIT_ID="40002" TYPE="01">

<RDG DT="2012-04-18T12:05:15" V="1.22" />

....

</UNIT>

....

</EXPORT>'

declare @idoc int

EXEC sp_xml_preparedocument @idoc OUTPUT, @XML

select *

FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)

WITH (Unit_ID nvarchar(100) '../@UNIT_ID',

DT datetime,

V nvarchar(10))

Mark Cowne One Orange Chip Points: 26967 More actions · Answer 2

This should be quicker

select b.value('@UNIT_ID', 'nvarchar(100)'),

r.value('@DT', 'DATETIME'),

r.value('@V', 'decimal(10, 8) ')

from @x.nodes('/EXPORT/UNIT') A(b)

cross apply A.b.nodes('RDG') X(r)

Have a look at the "Parent Axis" section here

http://msdn.microsoft.com/en-us/library/ms345118(v=sql.90).aspx

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

HowardW One Orange Chip Points: 29109 More actions · Answer 3

Mark-101232 (5/8/2012)
This should be quicker
select b.value('@UNIT_ID', 'nvarchar(100)'),
r.value('@DT', 'DATETIME'),
r.value('@V', 'decimal(10, 8) ')
from @x.nodes('/EXPORT/UNIT') A(b)
cross apply A.b.nodes('RDG') X(r)

Definitely an improvement, but still more than 20 times slower than OPENXML on my system (I mocked up ~7000 rows to test), so I'd go with OPENXML unless you need to shred multiple documents from an XML column in a table.

buddy__a Hall of Fame Points: 3222 More actions · Answer 4

Already used cross apply and it's VERY slow as well.

Mark-101232 (5/8/2012)
This should be quicker
select b.value('@UNIT_ID', 'nvarchar(100)'),
r.value('@DT', 'DATETIME'),
r.value('@V', 'decimal(10, 8) ')
from @x.nodes('/EXPORT/UNIT') A(b)
cross apply A.b.nodes('RDG') X(r)
Have a look at the "Parent Axis" section here
http://msdn.microsoft.com/en-us/library/ms345118(v=sql.90).aspx

buddy__a Hall of Fame Points: 3222 More actions · Answer 5

I don't know why but I always think that OPENXML is the slow so I don't try it. Thanks for reminding me, it was about 10 times faster than looping through the header (UNIT) elements and then processing the RDG elements in groups.

HowardW (5/8/2012)
Traversing parents is really slow in XQuery. If you're dealing with one massive document, rather than an XML column across many rows, use OPENXML instead, which will, comparatively, smoke it:
DECLARE @XML XML='<EXPORT>
<UNIT UNIT_ID="40001" TYPE="01">
<RDG DT="2012-04-18T19:00:01" V="0.03375" />
....
</UNIT>
<UNIT UNIT_ID="40002" TYPE="01">
<RDG DT="2012-04-18T12:05:15" V="1.22" />
....
</UNIT>
....
</EXPORT>'
declare @idoc int
EXEC sp_xml_preparedocument @idoc OUTPUT, @XML
select *
FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)
WITH (Unit_ID nvarchar(100) '../@UNIT_ID',
DT datetime,
V nvarchar(10))

Mark Cowne One Orange Chip Points: 26967 More actions · Answer 6

With a minor change, the cross apply code runs a lot quicker.

Here's my test set up

SET STATISTICS TIME OFF

DECLARE @x XML

SELECT @x = '<EXPORT>

<UNIT UNIT_ID="40001" TYPE="01">

<RDG DT="2012-04-18T19:00:01" V="0.03375" />

.

**add in more data here**

.

<RDG DT="2012-04-18T12:05:15" V="1.22" />

</UNIT>

</EXPORT>'

DECLARE @Unit_ID NVARCHAR(100)

DECLARE @dt DATETIME

DECLARE @v-2 nvarchar(10)

SET STATISTICS TIME OFF

PRINT '============================================================'

PRINT 'CROSS APPLY'

SET STATISTICS TIME ON

select @Unit_ID=b.value('@UNIT_ID[1]', 'nvarchar(100)'),

@dt= r.value('@DT[1]', 'DATETIME'),

@v-2= r.value('@V[1]', 'nvarchar(10)')

from @x.nodes('/EXPORT/UNIT') A(b)

cross apply A.b.nodes('RDG') X(r)

SET STATISTICS TIME OFF

PRINT '============================================================'

PRINT 'OPENXML'

SET STATISTICS TIME ON

declare @idoc int

EXEC sp_xml_preparedocument @idoc OUTPUT, @X

select @Unit_ID = Unit_ID,

@dt = DT,

@v-2 = V

FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)

WITH (Unit_ID nvarchar(100) '../@UNIT_ID',

DT datetime,

V nvarchar(10))

EXEC sp_xml_removedocument @idoc

SET STATISTICS TIME OFF

PRINT '============================================================'

Using about 8000 RDG elements on SQL 2008 EXPRESS

============================================================

CROSS APPLY

SQL Server Execution Times:

CPU time = 344 ms, elapsed time = 338 ms.

============================================================

OPENXML

SQL Server Execution Times:

CPU time = 78 ms, elapsed time = 108 ms.

SQL Server Execution Times:

CPU time = 312 ms, elapsed time = 327 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

============================================================

Same setup on SQL 2008 Standard Edition 64-bit

============================================================

CROSS APPLY

SQL Server Execution Times:

CPU time = 328 ms, elapsed time = 334 ms.

============================================================

OPENXML

SQL Server Execution Times:

CPU time = 94 ms, elapsed time = 87 ms.

SQL Server Execution Times:

CPU time = 234 ms, elapsed time = 229 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

============================================================

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537