Getting parent node attributes is very slow

  • I have an XML file that I need to import and insert data in to a table. The XML looks like below:

    <EXPORT>

    <UNIT UNIT_ID="40001" TYPE="01">

    <RDG DT="2012-04-18T19:00:01" V="0.03375" />

    ....

    </UNIT>

    <UNIT UNIT_ID="40002" TYPE="01">

    <RDG DT="2012-04-18T12:05:15" V="1.22" />

    ....

    </UNIT>

    ....

    </EXPORT>

    There are hundreds of UNITs and thousands of readings per unit. I need to get data from both the unit and reading elements to insert in the table. I can use XQuery to get UNIT element data OR RDG element data very fast (in a few seconds) however when I try to get both in one shot like below the query is unusably slow. It was so slow that I left it to go work on something else and forgot about it and about an hour and a half later it was still running. Is there a faster way?

    declare @x xml

    select @x = CAST(x1 AS XML)

    FROM OPENROWSET(

    BULK 'EXPORT_DATA.XML',

    SINGLE_BLOB) AS T1(x1)

    select r.value('parent::UNIT/@UNIT_ID', 'nvarchar(100)'),

    r.value('@DT', 'DATETIME'),

    r.value('@V', 'nvarchar(10)') as decimal(10, 8)

    from @x.nodes('/EXPORT/UNIT/RDG') X(r)

  • Traversing parents is really slow in XQuery. If you're dealing with one massive document, rather than an XML column across many rows, use OPENXML instead, which will, comparatively, smoke it:

    DECLARE @XML XML='<EXPORT>

    <UNIT UNIT_ID="40001" TYPE="01">

    <RDG DT="2012-04-18T19:00:01" V="0.03375" />

    ....

    </UNIT>

    <UNIT UNIT_ID="40002" TYPE="01">

    <RDG DT="2012-04-18T12:05:15" V="1.22" />

    ....

    </UNIT>

    ....

    </EXPORT>'

    declare @idoc int

    EXEC sp_xml_preparedocument @idoc OUTPUT, @XML

    select *

    FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)

    WITH (Unit_ID nvarchar(100) '../@UNIT_ID',

    DT datetime,

    V nvarchar(10))

  • This should be quicker

    select b.value('@UNIT_ID', 'nvarchar(100)'),

    r.value('@DT', 'DATETIME'),

    r.value('@V', 'decimal(10, 8) ')

    from @x.nodes('/EXPORT/UNIT') A(b)

    cross apply A.b.nodes('RDG') X(r)

    Have a look at the "Parent Axis" section here

    http://msdn.microsoft.com/en-us/library/ms345118(v=sql.90).aspx

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark-101232 (5/8/2012)


    This should be quicker

    select b.value('@UNIT_ID', 'nvarchar(100)'),

    r.value('@DT', 'DATETIME'),

    r.value('@V', 'decimal(10, 8) ')

    from @x.nodes('/EXPORT/UNIT') A(b)

    cross apply A.b.nodes('RDG') X(r)

    Definitely an improvement, but still more than 20 times slower than OPENXML on my system (I mocked up ~7000 rows to test), so I'd go with OPENXML unless you need to shred multiple documents from an XML column in a table.

  • Already used cross apply and it's VERY slow as well.

    Mark-101232 (5/8/2012)


    This should be quicker

    select b.value('@UNIT_ID', 'nvarchar(100)'),

    r.value('@DT', 'DATETIME'),

    r.value('@V', 'decimal(10, 8) ')

    from @x.nodes('/EXPORT/UNIT') A(b)

    cross apply A.b.nodes('RDG') X(r)

    Have a look at the "Parent Axis" section here

    http://msdn.microsoft.com/en-us/library/ms345118(v=sql.90).aspx

  • I don't know why but I always think that OPENXML is the slow so I don't try it. Thanks for reminding me, it was about 10 times faster than looping through the header (UNIT) elements and then processing the RDG elements in groups.

    HowardW (5/8/2012)


    Traversing parents is really slow in XQuery. If you're dealing with one massive document, rather than an XML column across many rows, use OPENXML instead, which will, comparatively, smoke it:

    DECLARE @XML XML='<EXPORT>

    <UNIT UNIT_ID="40001" TYPE="01">

    <RDG DT="2012-04-18T19:00:01" V="0.03375" />

    ....

    </UNIT>

    <UNIT UNIT_ID="40002" TYPE="01">

    <RDG DT="2012-04-18T12:05:15" V="1.22" />

    ....

    </UNIT>

    ....

    </EXPORT>'

    declare @idoc int

    EXEC sp_xml_preparedocument @idoc OUTPUT, @XML

    select *

    FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)

    WITH (Unit_ID nvarchar(100) '../@UNIT_ID',

    DT datetime,

    V nvarchar(10))

  • With a minor change, the cross apply code runs a lot quicker.

    Here's my test set up

    SET STATISTICS TIME OFF

    DECLARE @x XML

    SELECT @x = '<EXPORT>

    <UNIT UNIT_ID="40001" TYPE="01">

    <RDG DT="2012-04-18T19:00:01" V="0.03375" />

    .

    .

    **add in more data here**

    .

    <RDG DT="2012-04-18T12:05:15" V="1.22" />

    </UNIT>

    </EXPORT>'

    DECLARE @Unit_ID NVARCHAR(100)

    DECLARE @dt DATETIME

    DECLARE @v-2 nvarchar(10)

    SET STATISTICS TIME OFF

    PRINT '============================================================'

    PRINT 'CROSS APPLY'

    SET STATISTICS TIME ON

    select @Unit_ID=b.value('@UNIT_ID[1]', 'nvarchar(100)'),

    @dt= r.value('@DT[1]', 'DATETIME'),

    @v-2= r.value('@V[1]', 'nvarchar(10)')

    from @x.nodes('/EXPORT/UNIT') A(b)

    cross apply A.b.nodes('RDG') X(r)

    SET STATISTICS TIME OFF

    PRINT '============================================================'

    PRINT 'OPENXML'

    SET STATISTICS TIME ON

    declare @idoc int

    EXEC sp_xml_preparedocument @idoc OUTPUT, @X

    select @Unit_ID = Unit_ID,

    @dt = DT,

    @v-2 = V

    FROM OPENXML (@idoc, '/EXPORT/UNIT/RDG',1)

    WITH (Unit_ID nvarchar(100) '../@UNIT_ID',

    DT datetime,

    V nvarchar(10))

    EXEC sp_xml_removedocument @idoc

    SET STATISTICS TIME OFF

    PRINT '============================================================'

    Using about 8000 RDG elements on SQL 2008 EXPRESS

    ============================================================

    CROSS APPLY

    SQL Server Execution Times:

    CPU time = 344 ms, elapsed time = 338 ms.

    ============================================================

    OPENXML

    SQL Server Execution Times:

    CPU time = 78 ms, elapsed time = 108 ms.

    SQL Server Execution Times:

    CPU time = 312 ms, elapsed time = 327 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    ============================================================

    Same setup on SQL 2008 Standard Edition 64-bit

    ============================================================

    CROSS APPLY

    SQL Server Execution Times:

    CPU time = 328 ms, elapsed time = 334 ms.

    ============================================================

    OPENXML

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 87 ms.

    SQL Server Execution Times:

    CPU time = 234 ms, elapsed time = 229 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    ============================================================

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply