Processing XML Files

  • An outside party is passing us XML files that contain information to be imported into our database. This XML file is a standard format controlled by some other ruling party. We have no control over it's format. These files are placed in a designated area on our network. I have built a Stored Procedure that runs as follows:

    1) Compile list of .xml files to process one by one (using xp_cmdshell 'dir \\...')

    2) Insert XML file contents into table (one record per row in XML file).

    3) Iterate through each line to compile all code into @xmlDoc varchar(max) variable

    4) sp_xml_preparedocument for @xmlDoc

    5) Using OPENXML, extract necessary information for insert

    6) sp_xml_removedocument

    7) Delete process .xml file.

    This process works just fine with accurate results each time. However, this is extremely slow for large XML files. Well, they don't even have to be very big for this to slow down dramatically.

    I am wondering if anyone has some suggestions for ways of speeding up this process. Specific help would be great, but even some pointers would be very useful.

    Thanks in advance!!

  • First I would write a simple .NET service that monitors the directory where the XML is located and loads the XML into the database and then archives the file.

    Second I would use the XML datatype so I could use the XML functions built in to SQL Server. Using the older preparexmldoc and OPENXML are slower.

    You could also use SSIS to handle entire process.

  • In addition to the suggestion above to use XML and XPath queries, you might look into the XML Bulk loader. It sounds like you've got a perfect place to use it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thank you both to the reponses! I am on a learning curve but the bottom line sounds like I need to move away from this old and slow way of performing the imports. Too bad since it was pretty simple to put together. My files are not all that big, but the real slow performance comes from iterating through the lines of imported data to compile the one varchar(max) variable. Once that is put together, it returns the OPENXML results quickly.

    With XML Bulk Load, do I need to create a full XSD Schema? This XML file has hundreds of nodes and I only need a dozen or so of them. Is there a shortcut to putting together a schema with SQL table.column mapping? Maybe I just have more to read up on.

    Again, I appreciate your help!

  • To create a schema go to start --> programs --> microsoft.net framework sdk v2.0 --> sdk command prompt and type

    xsd c:\myfile.xml /outputdir:c:

    This will automatically create an XSD file.

    more info:

    http://msdn2.microsoft.com/en-us/library/x6c1kb0s(vs.71).aspx

  • Rereading your description, it sounds like you might be doing RBAR (row-by-agonizing-row) processing. Are you sure the process that you mention that is row-by-row can't be done as a batch? If that's the issue, and not the XML, we might be focusing on the wrong thing.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The initial XML load is RBAR as follows:

    CREATE TABLE #tmpXML(PK INT NOT NULL IDENTITY(1,1), ThisLine VARCHAR(500))

    SET @sql = 'TYPE c:\data.xml'

    INSERT INTO #tmpXML EXEC master.dbo.xp_cmdshell @sql

    SELECT @count = count(*) from #tmpXML

    SET @int = 0

    WHILE @int <> @count

    BEGIN

    SET @int = @int + 1

    SET @line = NULL

    SELECT @line = ThisLine FROM #tmpXML WHERE PK = @int

    IF @line IS NOT NULL

    SELECT @FileContents = @FileContents + @line

    FROM #tmpXML

    WHERE PK = @int

    END

    This is where the slowness occurs. I was hoping to use XML Bulk Load to set this up, but I am running up against some hurdles. I will start a new thread based around XML Bulk Load.

    Thanks again for all the responses and help!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply