Querying huge amount of XML data hits performance

  • Hi,

    We use XML datatype to store huge amount of validation error logs for our data. When viewing data in the application we also show error logs. Here is how XML structure for logs looks like

    There can be several thousands of nodes within nodes. When viewing data in the app we query all logs using following SQL

    ALTER FUNCTION [dbo].[FN00GetXMLLogsDataSet] ( @XMLDataState XML )

    RETURNS @XMLDataSet TABLE

    (

    -- Add the column definitions for the TABLE variable here

    ExpressionRef INT,

    ErrorResult VARCHAR(50),

    RowRef BIGINT,

    [TimeStamp] DATETIME

    )

    AS BEGIN

    -- Fill the table variable with the rows for your result set

    INSERT INTO @XMLDataSet

    SELECT R.nref.value('@ExpressionRef', 'INT') AS ExpressionRef,

    R.nref.value('@ErrorResult', 'VARCHAR(50)') AS ErrorResult,

    X.rref.value('@Ref', 'BIGINT') RowRef,

    X.rref.value('@TimeStamp', 'DATETIME') [TimeStamp]

    FROM @XMLDataState.nodes('//Log') AS R ( nref )

    CROSS APPLY R.nref.nodes('DataRows/Row') X ( rref )

    RETURN

    END

    I find this code hiting performance bottleneck when the XML has huge amount of nodes within .

    Can some advice me better way to optimize my function so that I can acheive performance improvements.

  • How is the xml variable stored originally?

    Do you get it from an xml column, a flat file or anything else?

    Is there any known filter you could apply within your openquery statement?

    Without knowing a little more about the "environment" it's hard to make recommendations...

    Maybe I would store the xml variable in a table with an xml column having an xml index applied instead of using the function on a large xml variable. But again, it's too vague to say for sure...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Try changing

    SELECT R.nref.value('@ExpressionRef', 'INT') AS ExpressionRef,

    R.nref.value('@ErrorResult', 'VARCHAR(50)') AS ErrorResult,

    X.rref.value('@Ref', 'BIGINT') RowRef,

    X.rref.value('@TimeStamp', 'DATETIME') [TimeStamp]

    to

    SELECT R.nref.value('@ExpressionRef[1]', 'INT') AS ExpressionRef,

    R.nref.value('@ErrorResult[1]', 'VARCHAR(50)') AS ErrorResult,

    X.rref.value('@Ref[1]', 'BIGINT') RowRef,

    X.rref.value('@TimeStamp[1]', 'DATETIME') [TimeStamp]

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Mark (7/14/2009)


    Try changing

    SELECT R.nref.value('@ExpressionRef', 'INT') AS ExpressionRef,

    R.nref.value('@ErrorResult', 'VARCHAR(50)') AS ErrorResult,

    X.rref.value('@Ref', 'BIGINT') RowRef,

    X.rref.value('@TimeStamp', 'DATETIME') [TimeStamp]

    to

    SELECT R.nref.value('@ExpressionRef[1]', 'INT') AS ExpressionRef,

    R.nref.value('@ErrorResult[1]', 'VARCHAR(50)') AS ErrorResult,

    X.rref.value('@Ref[1]', 'BIGINT') RowRef,

    X.rref.value('@TimeStamp[1]', 'DATETIME') [TimeStamp]

    This gave 100% performance improvements. However, I did not quite get why you have used [1] with every elements.

  • bhavster27 (7/14/2009)


    Mark (7/14/2009)


    Try changing

    SELECT R.nref.value('@ExpressionRef', 'INT') AS ExpressionRef,

    R.nref.value('@ErrorResult', 'VARCHAR(50)') AS ErrorResult,

    X.rref.value('@Ref', 'BIGINT') RowRef,

    X.rref.value('@TimeStamp', 'DATETIME') [TimeStamp]

    to

    SELECT R.nref.value('@ExpressionRef[1]', 'INT') AS ExpressionRef,

    R.nref.value('@ErrorResult[1]', 'VARCHAR(50)') AS ErrorResult,

    X.rref.value('@Ref[1]', 'BIGINT') RowRef,

    X.rref.value('@TimeStamp[1]', 'DATETIME') [TimeStamp]

    This gave 100% performance improvements. However, I did not quite get why you have used [1] with every elements.

    I don't really have an answer for that other than it gives a slightly better query plan. Attributes have to be unique within an element, so adding '[1]' shouldn't make any difference.

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply