Large XML values slow to process

Question

Large XML values slow to process

TT-131210

SSCommitted

Points: 1603
More actions
August 17, 2009 at 11:08 am

#208817

I have xml values that I am retrieving that vary in size. If the XML is small 50 "OrderDocumentRequest" elements the following code below works very quickly(seconds) but if I get an XML that say has 1000 "OrderDocumentRequest" elements with 30 to 40 "Field" elements the first 50 take a couple of mintues to insert "requestGuid" element. If I wait for 1000 to process it takes about an hour.
I would like to know why it is so slow. Is this possibly a memory issue.
DECLARE @RequestGuid VARCHAR(100)
DECLARE @xmlRecordMax INT
DECLARE @xmlRecord INT
DECLARE @xml xml
SET @xml = '

2923029 ADK 3012578260
TadID

DKE21
FIRST
F1

ABC
123

XYZ
1029

53456345 GED 3645645
TadID

DKE21
FIRST
F1

ABC
123

XYZ
1029

'
SELECT @xmlRecordMax = @xml.query(' { count(/OrderDocumentBatch/OrderDocumentRequest) } ').value('e[1]','int')
SET @xmlRecord = 1
WHILE @xmlRecord <= @xmlRecordMax BEGIN
SET @RequestGuid = Convert(varchar(100), NEWID());
SET @xml.modify( 'insert ({ sql:variable("@RequestGuid") }) into (/OrderDocumentBatch/OrderDocumentRequest[position()=sql:variable("@xmlRecord")])[1]')
SET @xmlRecord = @xmlRecord + 1
PRINT @xmlRecord
END
SELECT @xml

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

LutzM SSC Guru Points: 107049 More actions · Answer 1

The first question that comes into my mind when reading the requirement:

What is the business requirement of adding a NEWID to an xml variable using SQL server and leave it in xml structure within a variable?

Assuming, the requirement is valid, here are my 2 cents on the subject:

If you're updating a xml variable there is no index or anything like that to support the update query. In combination with a while loop the performance decrease with increasing data volume is an expected behavior.

Option 1: try to load the data into a temp table with a primary key and a xml column with index on it and update those data. (still uses a loop but may perform better due to the usage of an xml index).

Option 2: Extract the data into a relational table with an additional column holding your NEWID. If needed, export data back into xml structure. This would eliminate the loop and add the newid while extracting the data. Even with the additional task of extract and re-select of the data it should outperform the two previous options by far starting with just a few elements in the xml variable.

I'd use option 2.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

TT-131210 SSCommitted Points: 1603 More actions · Answer 2

I like your option 2. I was trying that in the beginning with the query below but I could not figure out how to distinguish between each of the unique "orderdocumentrequest" elements.

SELECT Doc.value('name[1]', 'varchar(200)') AS [Name],

Doc.value('value[1]', 'varchar(max)') AS [Value]

FROM @xml.nodes('/orderdocumentbatch/orderdocumentrequest') AS B(Breq)

CROSS APPLY Breq.nodes('documents/document/fields/field[name]') AS D(Doc)

So I wanted to do something like this after I was able to add the "requestGuid" element but this not going to work.

SELECT Doc.value('name[1]', 'varchar(200)') AS [Name],

Doc.value('value[1]', 'varchar(max)') AS [Value],

RG.value('.', 'varchar(255)') AS RequestGUID

FROM @xml.nodes('/orderdocumentbatch/orderdocumentrequest') AS B(Breq)

CROSS APPLY Breq.nodes('documents/document/fields/field[name]') AS D(Doc)

CROSS APPLY Breq.nodes('requestGuid') AS r(RG)

I am assuming there is a was to distinguish between the "orderdocumentrequest" elements but I cannot seem to figure it out.

LutzM SSC Guru Points: 107049 More actions · Answer 3

Try the following code. It's based on your sample with two minor changes:

First, I made it case sensitive to match your sample data and secondly, I added the caseID, assuming this will help to differentiate.

SELECT

Breq.value('caseId[1]','varchar(200)') AS [caseID],

Doc.value('Name[1]', 'varchar(200)') AS [Name],

Doc.value('Value[1]', 'varchar(max)') AS [Value]

FROM @xml.nodes('/OrderDocumentBatch/OrderDocumentRequest') AS B(Breq)

CROSS APPLY Breq.nodes('documents/document/Fields/Field[Name]') AS D(Doc)

/* result set

caseIDNameValue

2923029ABC123

2923029XYZ1029

53456345ABC123

53456345XYZ1029

*/

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

TT-131210 SSCommitted Points: 1603 More actions · Answer 4

I wanted to use that exact query but my problem is that there is nothing I can use to differentiate between the "orderdocumentrequest" elements. That is why I wanted to add the "requestGuid" element inside the "orderdocumentrequest" element so that I would have an element value that was unique that I could control.

Tim

LutzM SSC Guru Points: 107049 More actions · Answer 5

Now it gets confusing... :ermm:

Your sample data do contain the tag [caseID], which can be used to differentiate between each [orderdocumentrequest].

If that's not the case, you still could use all elements from the xml doc and add one line per [orderdocumentrequest] "group" containing all relevant elements.

Basically, you could end up with a list of completely identical data and still assign a unique number to each row...

Maybe it would help if you could add your required output based on your sample data.

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

TT-131210 SSCommitted Points: 1603 More actions · Answer 6

I want the result set you generated in your previous example. I have nothing in the xml that will work for unique id. I need to generate some unique id for each orderdocumentrequest element.

/* result set

some unique id Name Value

2923029 ABC 123

2923029 XYZ 1029

53456345 ABC 123

53456345 XYZ 1029

*/

LutzM SSC Guru Points: 107049 More actions · Answer 7

something like SELECT newid() AS ID,

Breq.value('caseId[1]','varchar(200)') AS [caseID],

Doc.value('Name[1]', 'varchar(200)') AS [Name],

Doc.value('Value[1]', 'varchar(max)') AS [Value]

FROM @xml.nodes('/OrderDocumentBatch/OrderDocumentRequest') AS B(Breq)

CROSS APPLY Breq.nodes('documents/document/Fields/Field[Name]') AS D(Doc)

/* result set

IDcaseIDNameValue

873D5601-E800-40E6-B60E-0E39B663B2902923029ABC123

7A85852B-465D-48FE-8C2E-8AF7D9B3680F2923029XYZ1029

EB40A5D7-812C-4B58-9F8E-58E5E742EF6353456345ABC123

49E889AD-8CA2-4DC4-9E8C-9B2F8DE0104853456345XYZ1029

*/

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

TT-131210 SSCommitted Points: 1603 More actions · Answer 8

Sorry for the confusion but that is not it. It needs to look like the results below.

/* result set

ID Name Value

873D5601-E800-40E6-B60E-0E39B663B290 ABC 123

873D5601-E800-40E6-B60E-0E39B663B290 XYZ 1029

EB40A5D7-812C-4B58-9F8E-58E5E742EF63 ABC 123

EB40A5D7-812C-4B58-9F8E-58E5E742EF63 XYZ 1029

*/

LutzM SSC Guru Points: 107049 More actions · Answer 9

Now I'm totally lost....

Why don't you use one or any combination of the tags within the OrderDocumentRequest element, like caseId(2923029), partitionCode (ADK), BatchNumber (3012578260) or tag(TadID)?

It's really hard to guess what you're looking for... What's wrong with 2923029 or 53456345 as identifiers? If it has to be an ID format, you could easily map it after shredding the xml into an relational structure.

If it doesn't exist in your data, why is it in your sample data? If it's not unique, why is it in your sample data?

Please note that we just work on the sample data you've provided. And based on those data, there already are some unique identifiers...

Edit: Typo fixed

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

TT-131210 SSCommitted Points: 1603 More actions · Answer 10

The fields you mentioned may not be unique. I took out the elements that were confusing and the sample below along with result set that I would need. The ID column is something that I would need to manually generate.

FirstName

Mike

LastName

Smith

FirstName

Lisa

LastName

Jones

Results Set

IDNameValue

7A85852B-465D-48FE-8C2E-8AF7D9B3680F FirstName Mike

7A85852B-465D-48FE-8C2E-8AF7D9B3680F LastName Smith

49E889AD-8CA2-4DC4-9E8C-9B2F8DE01048 FirstName Lisa

49E889AD-8CA2-4DC4-9E8C-9B2F8DE01048 LastName Jones

LutzM SSC Guru Points: 107049 More actions · Answer 11

Would something like the following meet your (new) requirement in terms of showing the values (leaving out the ID for the moment)?

It's not exactly what you've requested but it follows more a relational data model, I think.

The assumption I made is to have predefined order as well as number of fields within the document node.

Is this assumption correct? (It is at least compliant with both of your sample files...)

If not, please provide sample data describing your scenario.

SELECT

Doc.value('field[1]/value[1]', 'varchar(30)') AS FirstName,

Doc.value('field[2]/value[1]', 'varchar(30)') AS LastName

FROM @xml.nodes('/orderdocumentbatch/orderdocumentrequest') AS B(Breq)

CROSS APPLY Breq.nodes('documents/document/fields') AS D(Doc)

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

TT-131210 SSCommitted Points: 1603 More actions · Answer 12

There will be multiple fields and not always same number of field and not in the same order for each orderdocumentrequest element. But I don't always want every field. I only want certain fields.

FirstName

Mike

LastName

Smith

MiddleName

Peter

FirstName

Lisa

LastName

Jones

Address

123 Main St

Phone

988-454-4888

Result Set

FirstName LastName, Phone

Mike Smith

Lisa Jones 988-454-4888

LutzM SSC Guru Points: 107049 More actions · Answer 13

Let's go for the next turn....

How about this version?

It returns all the values from the xml document rather than the selective ones you requested but I decided to leave it like this due to the frequent change of the requirement / sample data structure...

SELECT

Breq.value('1+count(for $a in . return $a/../*[. << $a])','INT') AS id,

Doc.value('name[1]', 'varchar(200)') AS [Name],

Doc.value('value[1]', 'varchar(max)') AS [Value]

FROM @xml.nodes('/orderdocumentbatch/orderdocumentrequest') AS B(Breq)

CROSS APPLY Breq.nodes('documents/document/fields/field[name]') AS D(Doc)

/* result set

idNameValue

1FirstNameMike

1LastNameSmith

1MiddleNamePeter

2FirstNameLisa

2LastNameJones

2Address123 Main St

2Phone988-454-4888*/

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

TT-131210 SSCommitted Points: 1603 More actions · Answer 14

This solution will work. I need to study up on how you did the ID field and figure out the syntax.

Thank you for your time.

Tim