January 11, 2008 at 11:00 am
An outside party is passing us XML files that contain information to be imported into our database. This XML file is a standard format controlled by some other ruling party. We have no control over it's format. These files are placed in a designated area on our network. I have built a Stored Procedure that runs as follows:
1) Compile list of .xml files to process one by one (using xp_cmdshell 'dir \\...')
2) Insert XML file contents into table (one record per row in XML file).
3) Iterate through each line to compile all code into @xmlDoc varchar(max) variable
4) sp_xml_preparedocument for @xmlDoc
5) Using OPENXML, extract necessary information for insert
6) sp_xml_removedocument
7) Delete process .xml file.
This process works just fine with accurate results each time. However, this is extremely slow for large XML files. Well, they don't even have to be very big for this to slow down dramatically.
I am wondering if anyone has some suggestions for ways of speeding up this process. Specific help would be great, but even some pointers would be very useful.
Thanks in advance!!
January 11, 2008 at 11:20 am
First I would write a simple .NET service that monitors the directory where the XML is located and loads the XML into the database and then archives the file.
Second I would use the XML datatype so I could use the XML functions built in to SQL Server. Using the older preparexmldoc and OPENXML are slower.
You could also use SSIS to handle entire process.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 11, 2008 at 11:35 am
In addition to the suggestion above to use XML and XPath queries, you might look into the XML Bulk loader. It sounds like you've got a perfect place to use it.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 11, 2008 at 1:18 pm
Thank you both to the reponses! I am on a learning curve but the bottom line sounds like I need to move away from this old and slow way of performing the imports. Too bad since it was pretty simple to put together. My files are not all that big, but the real slow performance comes from iterating through the lines of imported data to compile the one varchar(max) variable. Once that is put together, it returns the OPENXML results quickly.
With XML Bulk Load, do I need to create a full XSD Schema? This XML file has hundreds of nodes and I only need a dozen or so of them. Is there a shortcut to putting together a schema with SQL table.column mapping? Maybe I just have more to read up on.
Again, I appreciate your help!
January 11, 2008 at 2:02 pm
To create a schema go to start --> programs --> microsoft.net framework sdk v2.0 --> sdk command prompt and type
xsd c:\myfile.xml /outputdir:c:
This will automatically create an XSD file.
more info:
http://msdn2.microsoft.com/en-us/library/x6c1kb0s(vs.71).aspx
January 12, 2008 at 4:31 am
Rereading your description, it sounds like you might be doing RBAR (row-by-agonizing-row) processing. Are you sure the process that you mention that is row-by-row can't be done as a batch? If that's the issue, and not the XML, we might be focusing on the wrong thing.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 14, 2008 at 10:45 am
The initial XML load is RBAR as follows:
CREATE TABLE #tmpXML(PK INT NOT NULL IDENTITY(1,1), ThisLine VARCHAR(500))
SET @sql = 'TYPE c:\data.xml'
INSERT INTO #tmpXML EXEC master.dbo.xp_cmdshell @sql
SELECT @count = count(*) from #tmpXML
SET @int = 0
WHILE @int <> @count
BEGIN
SET @int = @int + 1
SET @line = NULL
SELECT @line = ThisLine FROM #tmpXML WHERE PK = @int
IF @line IS NOT NULL
SELECT @FileContents = @FileContents + @line
FROM #tmpXML
WHERE PK = @int
END
This is where the slowness occurs. I was hoping to use XML Bulk Load to set this up, but I am running up against some hurdles. I will start a new thread based around XML Bulk Load.
Thanks again for all the responses and help!
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply