openrowset(bulk performance with XML

  • I was sent a script by our vendor to update some data from an XML file:

    declare @xml_data xml

    declare @invoices table (tran_id int)

    select @xml_data = casts(bulkcolumn as xml) from openrowset(bulk, n'some_file.xml', single_blob) as inv

    insert @invoices select mytable.mycol.value('.', 'INT') from @xml.data.nodes('/DocumentElement/Table/Tran_ID') as mytable(mycol)

    some_file.xml is 330 megabytes.

    I kicked this off 12 hours ago and it is still executing. Previously the same code was used for a much (MUCH) smaller file and still took about an hour.

    Can anyone comment on why the performance of this is so bad?

    I can perform the same operation by using regex on the source XML file in textpad to turn it into a simple columnar list and then doing a regular bulk insert. This method takes a couple of minutes total. Our vendor can't provide that as "a solution" though, since the solution they provide must be able to be executed without requiring that manual step of reformatting the file through regex.

  • I would store that xml file in a table with an xml column and an identity col.

    Add an XML index to that table. The insert should perform much better.

    Another option would be using OPENXML without casting the data to xml.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply