Large XML values slow to process

  • I have xml values that I am retrieving that vary in size. If the XML is small 50 "OrderDocumentRequest" elements the following code below works very quickly(seconds) but if I get an XML that say has 1000 "OrderDocumentRequest" elements with 30 to 40 "Field" elements the first 50 take a couple of mintues to insert "requestGuid" element. If I wait for 1000 to process it takes about an hour.

    I would like to know why it is so slow. Is this possibly a memory issue.

    DECLARE @RequestGuid VARCHAR(100)

    DECLARE @xmlRecordMax INT

    DECLARE @xmlRecord INT

    DECLARE @xml xml

    SET @xml = '

    2923029 ADK 3012578260

    TadID

    DKE21

    FIRST

    F1

    ABC

    123

    XYZ

    1029

    53456345 GED 3645645

    TadID

    DKE21

    FIRST

    F1

    ABC

    123

    XYZ

    1029

    '

    SELECT @xmlRecordMax = @xml.query(' { count(/OrderDocumentBatch/OrderDocumentRequest) } ').value('e[1]','int')

    SET @xmlRecord = 1

    WHILE @xmlRecord <= @xmlRecordMax
    BEGIN

    SET @RequestGuid = Convert(varchar(100), NEWID());

    SET @xml.modify( 'insert ({ sql:variable("@RequestGuid") }) into (/OrderDocumentBatch/OrderDocumentRequest[position()=sql:variable("@xmlRecord")])[1]')

    SET @xmlRecord = @xmlRecord + 1

    PRINT @xmlRecord

    END

    SELECT @xml

  • The first question that comes into my mind when reading the requirement:

    What is the business requirement of adding a NEWID to an xml variable using SQL server and leave it in xml structure within a variable?

    Assuming, the requirement is valid, here are my 2 cents on the subject:

    If you're updating a xml variable there is no index or anything like that to support the update query. In combination with a while loop the performance decrease with increasing data volume is an expected behavior.

    Option 1: try to load the data into a temp table with a primary key and a xml column with index on it and update those data. (still uses a loop but may perform better due to the usage of an xml index).

    Option 2: Extract the data into a relational table with an additional column holding your NEWID. If needed, export data back into xml structure. This would eliminate the loop and add the newid while extracting the data. Even with the additional task of extract and re-select of the data it should outperform the two previous options by far starting with just a few elements in the xml variable.

    I'd use option 2.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I like your option 2. I was trying that in the beginning with the query below but I could not figure out how to distinguish between each of the unique "orderdocumentrequest" elements.

    SELECT Doc.value('name[1]', 'varchar(200)') AS [Name],

    Doc.value('value[1]', 'varchar(max)') AS [Value]

    FROM @xml.nodes('/orderdocumentbatch/orderdocumentrequest') AS B(Breq)

    CROSS APPLY Breq.nodes('documents/document/fields/field[name]') AS D(Doc)

    So I wanted to do something like this after I was able to add the "requestGuid" element but this not going to work.

    SELECT Doc.value('name[1]', 'varchar(200)') AS [Name],

    Doc.value('value[1]', 'varchar(max)') AS [Value],

    RG.value('.', 'varchar(255)') AS RequestGUID

    FROM @xml.nodes('/orderdocumentbatch/orderdocumentrequest') AS B(Breq)

    CROSS APPLY Breq.nodes('documents/document/fields/field[name]') AS D(Doc)

    CROSS APPLY Breq.nodes('requestGuid') AS r(RG)

    I am assuming there is a was to distinguish between the "orderdocumentrequest" elements but I cannot seem to figure it out.

  • Try the following code. It's based on your sample with two minor changes:

    First, I made it case sensitive to match your sample data and secondly, I added the caseID, assuming this will help to differentiate.

    SELECT

    Breq.value('caseId[1]','varchar(200)') AS [caseID],

    Doc.value('Name[1]', 'varchar(200)') AS [Name],

    Doc.value('Value[1]', 'varchar(max)') AS [Value]

    FROM @xml.nodes('/OrderDocumentBatch/OrderDocumentRequest') AS B(Breq)

    CROSS APPLY Breq.nodes('documents/document/Fields/Field[Name]') AS D(Doc)

    /* result set

    caseIDNameValue

    2923029ABC123

    2923029XYZ1029

    53456345ABC123

    53456345XYZ1029

    */



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I wanted to use that exact query but my problem is that there is nothing I can use to differentiate between the "orderdocumentrequest" elements. That is why I wanted to add the "requestGuid" element inside the "orderdocumentrequest" element so that I would have an element value that was unique that I could control.

    Tim

  • Now it gets confusing... :ermm:

    Your sample data do contain the tag [caseID], which can be used to differentiate between each [orderdocumentrequest].

    If that's not the case, you still could use all elements from the xml doc and add one line per [orderdocumentrequest] "group" containing all relevant elements.

    Basically, you could end up with a list of completely identical data and still assign a unique number to each row...

    Maybe it would help if you could add your required output based on your sample data.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I want the result set you generated in your previous example. I have nothing in the xml that will work for unique id. I need to generate some unique id for each orderdocumentrequest element.

    /* result set

    some unique id Name Value

    2923029 ABC 123

    2923029 XYZ 1029

    53456345 ABC 123

    53456345 XYZ 1029

    */

  • something like SELECT newid() AS ID,

    Breq.value('caseId[1]','varchar(200)') AS [caseID],

    Doc.value('Name[1]', 'varchar(200)') AS [Name],

    Doc.value('Value[1]', 'varchar(max)') AS [Value]

    FROM @xml.nodes('/OrderDocumentBatch/OrderDocumentRequest') AS B(Breq)

    CROSS APPLY Breq.nodes('documents/document/Fields/Field[Name]') AS D(Doc)

    /* result set

    IDcaseIDNameValue

    873D5601-E800-40E6-B60E-0E39B663B2902923029ABC123

    7A85852B-465D-48FE-8C2E-8AF7D9B3680F2923029XYZ1029

    EB40A5D7-812C-4B58-9F8E-58E5E742EF6353456345ABC123

    49E889AD-8CA2-4DC4-9E8C-9B2F8DE0104853456345XYZ1029

    */



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Sorry for the confusion but that is not it. It needs to look like the results below.

    /* result set

    ID Name Value

    873D5601-E800-40E6-B60E-0E39B663B290 ABC 123

    873D5601-E800-40E6-B60E-0E39B663B290 XYZ 1029

    EB40A5D7-812C-4B58-9F8E-58E5E742EF63 ABC 123

    EB40A5D7-812C-4B58-9F8E-58E5E742EF63 XYZ 1029

    */

  • Now I'm totally lost....

    Why don't you use one or any combination of the tags within the OrderDocumentRequest element, like caseId(2923029), partitionCode (ADK), BatchNumber (3012578260) or tag(TadID)?

    It's really hard to guess what you're looking for... What's wrong with 2923029 or 53456345 as identifiers? If it has to be an ID format, you could easily map it after shredding the xml into an relational structure.

    If it doesn't exist in your data, why is it in your sample data? If it's not unique, why is it in your sample data?

    Please note that we just work on the sample data you've provided. And based on those data, there already are some unique identifiers...

    Edit: Typo fixed



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • The fields you mentioned may not be unique. I took out the elements that were confusing and the sample below along with result set that I would need. The ID column is something that I would need to manually generate.

    FirstName

    Mike

    LastName

    Smith

    FirstName

    Lisa

    LastName

    Jones

    Results Set

    IDNameValue

    7A85852B-465D-48FE-8C2E-8AF7D9B3680F FirstName Mike

    7A85852B-465D-48FE-8C2E-8AF7D9B3680F LastName Smith

    49E889AD-8CA2-4DC4-9E8C-9B2F8DE01048 FirstName Lisa

    49E889AD-8CA2-4DC4-9E8C-9B2F8DE01048 LastName Jones

  • Would something like the following meet your (new) requirement in terms of showing the values (leaving out the ID for the moment)?

    It's not exactly what you've requested but it follows more a relational data model, I think.

    The assumption I made is to have predefined order as well as number of fields within the document node.

    Is this assumption correct? (It is at least compliant with both of your sample files...)

    If not, please provide sample data describing your scenario.

    SELECT

    Doc.value('field[1]/value[1]', 'varchar(30)') AS FirstName,

    Doc.value('field[2]/value[1]', 'varchar(30)') AS LastName

    FROM @xml.nodes('/orderdocumentbatch/orderdocumentrequest') AS B(Breq)

    CROSS APPLY Breq.nodes('documents/document/fields') AS D(Doc)



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • There will be multiple fields and not always same number of field and not in the same order for each orderdocumentrequest element. But I don't always want every field. I only want certain fields.

    FirstName

    Mike

    LastName

    Smith

    MiddleName

    Peter

    FirstName

    Lisa

    LastName

    Jones

    Address

    123 Main St

    Phone

    988-454-4888

    Result Set

    FirstName LastName, Phone

    Mike Smith

    Lisa Jones 988-454-4888

  • Let's go for the next turn....

    How about this version?

    It returns all the values from the xml document rather than the selective ones you requested but I decided to leave it like this due to the frequent change of the requirement / sample data structure...

    SELECT

    Breq.value('1+count(for $a in . return $a/../*[. << $a])','INT') AS id,

    Doc.value('name[1]', 'varchar(200)') AS [Name],

    Doc.value('value[1]', 'varchar(max)') AS [Value]

    FROM @xml.nodes('/orderdocumentbatch/orderdocumentrequest') AS B(Breq)

    CROSS APPLY Breq.nodes('documents/document/fields/field[name]') AS D(Doc)

    /* result set

    idNameValue

    1FirstNameMike

    1LastNameSmith

    1MiddleNamePeter

    2FirstNameLisa

    2LastNameJones

    2Address123 Main St

    2Phone988-454-4888*/



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • This solution will work. I need to study up on how you did the ID field and figure out the syntax.

    Thank you for your time.

    Tim

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply