OpenXML and memory usage...

  • Hi, I am passing arguments into a stored procedure within an XML document. I am then using sp_xml_preparedocument to parse the XML into a memory structure. I am concerned about how quickly I should remove the document. SQL Server Books Online states "A parsed document is stored in the internal cache of SQL Server 2000. The MSXML parser uses one-eighth the total memory available for SQL Server. To avoid running out of memory, run sp_xml_removedocument to free up the memory".

    Should I be creating a table variable, populating it with the needed XML data as soon as possible, removing the document, and then running my queries based on the table variable? Alternatively should I not be too concerned about the memory usage and just use OpenXML on the prepared document instead of table variables and free the document at the end of my stored procedure?

    Does anyone have an opinion on the memory usage and performance of the above two strategies, also what would be best way of measuring the SQL server's memory usage?

    Thank you,

    Paul.

     

     

     

     

  • Hi Paul,

    This post is very much a "your mileage may vary" sort of response.  I'm running a system doing near-real-time data collection from a set of network applicances.  The appliances currently submit their data to the system via a web service, and the payload is subsequently passed to the database server for parsing into tables.  In order to keep the server load as efficient as possible, we went to the "prepare document - insert to temp table - remove document - contiue processing" model.  There is an added benefit that you can easily join to the temp table for validation checking and other queries.

    We originally started by keeping the XML docs open for longer periods of time, but the server routinely thrashed with memory-to-disk swapping.  When we changed the algorithm, the thrashing didn't go away completely, but it lasts for a much shorter period of time.  Basically, we did not eliminate the problem, but made it smaller so that it is now a tolerable annoyance.

    Hope this helps

    Wayne

  • Hi Wayne,

    Thank you for your response. I am now wondering about how much data you were passing into the stored procedure. You have opted to use temporary tables rather than table variables which seems to suggest that you wanted to persist the data to disk rather than hold it in memory (maybe this is just so you could externally run joins on the data).

    This question really centres around how OpenXML works. If OpenXML takes 1/8th of the server's memory regardless of document size then the larger the document the more advantage there is in just keeping the document in XML format as the additional memory retrieved in releasing the document won't be as great and there is the additional cost of transferring it to another structure. If the 1/8th server memory is on top of the size of the data then (in terms of memory usage) you are always better to transfer the data to a less memory hungry structure. Now the problem comes down to a speed trade off. In all such problems the server should be considered to be 'under load'. Is it better to have a shorter execution time and hang onto the XML document; or move the document to another structure and so increase the execution time but release memory (also since the server is under load the increased execution time may be due to a lack of memory)? I suppose I can answer my own question here.... keeping in mind that SQL server is memory hungry (just to operate), then as much memory as possible should be given to the server, this increase in memory will improve the execution time of other queries. By my reckoning it is best to remove the additional memory overhead as quickly as possible, which, unless the query in the routine is trivial, will mean moving the XML document to a less memory hungry structure.

    Does anyone have any comments on my reasoning here? Am I way off the mark?

    Thank you,

    Paul.

  • Personally, I always throw my xml data into a @variabletable, first thing.

    Sometimes I populate mulitple @variableTables with the same xml doc.

    My xml doc data is usually for bulk inserts, that has ALREADY BEEN PROCESSED in a .net object.  Aka, the data is already been scrubbed, and I actually sometimes call my variables "perfectXML" to remind myself this is post-business objects xml.

    I've maybe done 100,000 records.  Not millions.

    So I don't have any hard core "Numbers" for you, just telling you what I do over and over.  xml comes in, goes into an @variabletable, and the xmlDoc gets destroyed.

    ..

    PS  I seldom use #temptables anymore.  The @variabletables are my bread and butter.

    I usually populate them, and then at the end of my procedure do something like

    BEGIN TRAN

    Insert into ReadDBTable

      Select * from @variableTable

    COMMIT TRAN

     

    This keeps the transaction as small as possible (code wise) and either my bulk insert will work COMPLETELY or NOT COMPLETELY, which is how I like it.  I find my violations of unique constraints (or whatever) very quickly when the bulk insert doesn't work.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply