Querying huge amount of XML data hits performance

Question

Post reply

Querying huge amount of XML data hits performance

Bhavesh-1094084

SSC Eights!

Points: 865
More actions
July 14, 2009 at 8:19 am

#95561

Hi,
We use XML datatype to store huge amount of validation error logs for our data. When viewing data in the application we also show error logs. Here is how XML structure for logs looks like

There can be several thousands of nodes within nodes. When viewing data in the app we query all logs using following SQL
ALTER FUNCTION [dbo].[FN00GetXMLLogsDataSet] ( @XMLDataState XML )
RETURNS @XMLDataSet TABLE
(
-- Add the column definitions for the TABLE variable here
ExpressionRef INT,
ErrorResult VARCHAR(50),
RowRef BIGINT,
[TimeStamp] DATETIME
)
AS BEGIN
-- Fill the table variable with the rows for your result set
INSERT INTO @XMLDataSet
SELECT R.nref.value('@ExpressionRef', 'INT') AS ExpressionRef,
R.nref.value('@ErrorResult', 'VARCHAR(50)') AS ErrorResult,
X.rref.value('@Ref', 'BIGINT') RowRef,
X.rref.value('@TimeStamp', 'DATETIME') [TimeStamp]
FROM @XMLDataState.nodes('//Log') AS R ( nref )
CROSS APPLY R.nref.nodes('DataRows/Row') X ( rref )
RETURN
END
I find this code hiting performance bottleneck when the XML has huge amount of nodes within .
Can some advice me better way to optimize my function so that I can acheive performance improvements.
Bhavesh
.NET and SQL Server Blog

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

How is the xml variable stored originally?

Do you get it from an xml column, a flat file or anything else?

Is there any known filter you could apply within your openquery statement?

Without knowing a little more about the "environment" it's hard to make recommendations...

Maybe I would store the xml variable in a table with an xml column having an xml index applied instead of using the function on a large xml variable. But again, it's too vague to say for sure...

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 2

Try changing

SELECT R.nref.value('@ExpressionRef', 'INT') AS ExpressionRef,

R.nref.value('@ErrorResult', 'VARCHAR(50)') AS ErrorResult,

X.rref.value('@Ref', 'BIGINT') RowRef,

X.rref.value('@TimeStamp', 'DATETIME') [TimeStamp]

to

SELECT R.nref.value('@ExpressionRef[1]', 'INT') AS ExpressionRef,

R.nref.value('@ErrorResult[1]', 'VARCHAR(50)') AS ErrorResult,

X.rref.value('@Ref[1]', 'BIGINT') RowRef,

X.rref.value('@TimeStamp[1]', 'DATETIME') [TimeStamp]

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Bhavesh-1094084 SSC Eights! Points: 865 More actions · Answer 3

Mark (7/14/2009)
Try changing
SELECT R.nref.value('@ExpressionRef', 'INT') AS ExpressionRef,
R.nref.value('@ErrorResult', 'VARCHAR(50)') AS ErrorResult,
X.rref.value('@Ref', 'BIGINT') RowRef,
X.rref.value('@TimeStamp', 'DATETIME') [TimeStamp]
to
SELECT R.nref.value('@ExpressionRef[1]', 'INT') AS ExpressionRef,
R.nref.value('@ErrorResult[1]', 'VARCHAR(50)') AS ErrorResult,
X.rref.value('@Ref[1]', 'BIGINT') RowRef,
X.rref.value('@TimeStamp[1]', 'DATETIME') [TimeStamp]

This gave 100% performance improvements. However, I did not quite get why you have used [1] with every elements.

Bhavesh
.NET and SQL Server Blog

Mark Cowne One Orange Chip Points: 26954 More actions · Answer 4

bhavster27 (7/14/2009)
Mark (7/14/2009)
Try changing
SELECT R.nref.value('@ExpressionRef', 'INT') AS ExpressionRef,
R.nref.value('@ErrorResult', 'VARCHAR(50)') AS ErrorResult,
X.rref.value('@Ref', 'BIGINT') RowRef,
X.rref.value('@TimeStamp', 'DATETIME') [TimeStamp]
to
SELECT R.nref.value('@ExpressionRef[1]', 'INT') AS ExpressionRef,
R.nref.value('@ErrorResult[1]', 'VARCHAR(50)') AS ErrorResult,
X.rref.value('@Ref[1]', 'BIGINT') RowRef,
X.rref.value('@TimeStamp[1]', 'DATETIME') [TimeStamp]
This gave 100% performance improvements. However, I did not quite get why you have used [1] with every elements.

I don't really have an answer for that other than it gives a slightly better query plan. Attributes have to be unique within an element, so adding '[1]' shouldn't make any difference.

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537