May 7, 2012 at 12:59 pm
I have an XML file that I need to import and insert data in to a table. The XML looks like below:
<EXPORT>
<UNIT UNIT_ID="40001" TYPE="01">
<RDG DT="2012-04-18T19:00:01" V="0.03375" />
....
</UNIT>
<UNIT UNIT_ID="40002" TYPE="01">
<RDG DT="2012-04-18T12:05:15" V="1.22" />
....
</UNIT>
....
</EXPORT>
There are hundreds of UNITs and thousands of readings per unit. I need to get data from both the unit and reading elements to insert in the table. I can use XQuery to get UNIT element data OR RDG element data very fast (in a few seconds) however when I try to get both in one shot like below the query is unusably slow. It was so slow that I left it to go work on something else and forgot about it and about an hour and a half later it was still running. Is there a faster way?
declare @x xml
select @x = CAST(x1 AS XML)
FROM OPENROWSET(
BULK 'EXPORT_DATA.XML',
SINGLE_BLOB) AS T1(x1)
select r.value('parent::UNIT/@UNIT_ID', 'nvarchar(100)'),
r.value('@DT', 'DATETIME'),
r.value('@V', 'nvarchar(10)') as decimal(10, 8)
from @x.nodes('/EXPORT/UNIT/RDG') X(r)
May 8, 2012 at 9:36 am
Traversing parents is really slow in XQuery. If you're dealing with one massive document, rather than an XML column across many rows, use OPENXML instead, which will, comparatively, smoke it:
DECLARE @XML XML='<EXPORT>
<UNIT UNIT_ID="40001" TYPE="01">
<RDG DT="2012-04-18T19:00:01" V="0.03375" />
....
</UNIT>
<UNIT UNIT_ID="40002" TYPE="01">
<RDG DT="2012-04-18T12:05:15" V="1.22" />
....
</UNIT>
....
</EXPORT>'
declare @idoc int
EXEC sp_xml_preparedocument @idoc OUTPUT, @XML
select *
FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)
WITH (Unit_ID nvarchar(100) '../@UNIT_ID',
DT datetime,
V nvarchar(10))
May 8, 2012 at 9:56 am
This should be quicker
select b.value('@UNIT_ID', 'nvarchar(100)'),
r.value('@DT', 'DATETIME'),
r.value('@V', 'decimal(10, 8) ')
from @x.nodes('/EXPORT/UNIT') A(b)
cross apply A.b.nodes('RDG') X(r)
Have a look at the "Parent Axis" section here
http://msdn.microsoft.com/en-us/library/ms345118(v=sql.90).aspx
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537May 8, 2012 at 10:09 am
Mark-101232 (5/8/2012)
This should be quicker
select b.value('@UNIT_ID', 'nvarchar(100)'),
r.value('@DT', 'DATETIME'),
r.value('@V', 'decimal(10, 8) ')
from @x.nodes('/EXPORT/UNIT') A(b)
cross apply A.b.nodes('RDG') X(r)
Definitely an improvement, but still more than 20 times slower than OPENXML on my system (I mocked up ~7000 rows to test), so I'd go with OPENXML unless you need to shred multiple documents from an XML column in a table.
May 8, 2012 at 1:52 pm
Already used cross apply and it's VERY slow as well.
Mark-101232 (5/8/2012)
This should be quicker
select b.value('@UNIT_ID', 'nvarchar(100)'),
r.value('@DT', 'DATETIME'),
r.value('@V', 'decimal(10, 8) ')
from @x.nodes('/EXPORT/UNIT') A(b)
cross apply A.b.nodes('RDG') X(r)
Have a look at the "Parent Axis" section here
http://msdn.microsoft.com/en-us/library/ms345118(v=sql.90).aspx
May 8, 2012 at 2:39 pm
I don't know why but I always think that OPENXML is the slow so I don't try it. Thanks for reminding me, it was about 10 times faster than looping through the header (UNIT) elements and then processing the RDG elements in groups.
HowardW (5/8/2012)
Traversing parents is really slow in XQuery. If you're dealing with one massive document, rather than an XML column across many rows, use OPENXML instead, which will, comparatively, smoke it:
DECLARE @XML XML='<EXPORT>
<UNIT UNIT_ID="40001" TYPE="01">
<RDG DT="2012-04-18T19:00:01" V="0.03375" />
....
</UNIT>
<UNIT UNIT_ID="40002" TYPE="01">
<RDG DT="2012-04-18T12:05:15" V="1.22" />
....
</UNIT>
....
</EXPORT>'
declare @idoc int
EXEC sp_xml_preparedocument @idoc OUTPUT, @XML
select *
FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)
WITH (Unit_ID nvarchar(100) '../@UNIT_ID',
DT datetime,
V nvarchar(10))
May 9, 2012 at 9:44 am
With a minor change, the cross apply code runs a lot quicker.
Here's my test set up
SET STATISTICS TIME OFF
DECLARE @x XML
SELECT @x = '<EXPORT>
<UNIT UNIT_ID="40001" TYPE="01">
<RDG DT="2012-04-18T19:00:01" V="0.03375" />
.
.
**add in more data here**
.
<RDG DT="2012-04-18T12:05:15" V="1.22" />
</UNIT>
</EXPORT>'
DECLARE @Unit_ID NVARCHAR(100)
DECLARE @dt DATETIME
DECLARE @v-2 nvarchar(10)
SET STATISTICS TIME OFF
PRINT '============================================================'
PRINT 'CROSS APPLY'
SET STATISTICS TIME ON
select @Unit_ID=b.value('@UNIT_ID[1]', 'nvarchar(100)'),
@dt= r.value('@DT[1]', 'DATETIME'),
@v-2= r.value('@V[1]', 'nvarchar(10)')
from @x.nodes('/EXPORT/UNIT') A(b)
cross apply A.b.nodes('RDG') X(r)
SET STATISTICS TIME OFF
PRINT '============================================================'
PRINT 'OPENXML'
SET STATISTICS TIME ON
declare @idoc int
EXEC sp_xml_preparedocument @idoc OUTPUT, @X
select @Unit_ID = Unit_ID,
@dt = DT,
@v-2 = V
FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)
WITH (Unit_ID nvarchar(100) '../@UNIT_ID',
DT datetime,
V nvarchar(10))
EXEC sp_xml_removedocument @idoc
SET STATISTICS TIME OFF
PRINT '============================================================'
Using about 8000 RDG elements on SQL 2008 EXPRESS
============================================================
CROSS APPLY
SQL Server Execution Times:
CPU time = 344 ms, elapsed time = 338 ms.
============================================================
OPENXML
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 108 ms.
SQL Server Execution Times:
CPU time = 312 ms, elapsed time = 327 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
============================================================
Same setup on SQL 2008 Standard Edition 64-bit
============================================================
CROSS APPLY
SQL Server Execution Times:
CPU time = 328 ms, elapsed time = 334 ms.
============================================================
OPENXML
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 87 ms.
SQL Server Execution Times:
CPU time = 234 ms, elapsed time = 229 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
============================================================
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply