String manipulation, especially with large strings, has always been a performance issue on Microsoft Windows operating systems. I was recently trouble-shooting a performance issue, and while doing so I learned some very useful methods where using XML can significantly improve performance during large string manipulations. In all of my examples below, I’m using the Sales.SalesOrderHeader table (which has a little over 31,000 records) from the AdventureWorks database, which you can download and install from CodePlex.com.
1. Generation of an XML string.
I was creating a rather lengthy XML string, and waiting a frustratingly long amount of time to build the string. I was building the XML string by appending the XML tags with the data to the string row by row, like so:
-- build XML string manually use AdventureWorks GO declare @MyXMLString varchar(max) declare @StartTime datetime, @EndTime datetime set @MyXMLString = '' set @StartTime = CURRENT_TIMESTAMP select @MyXMLString = @MyXMLString + '<Row><AccountNumber >'+ AccountNumber + '</AccountNumber ></Row>' from Sales.SalesOrderHeader set @MyXMLString = '<Rows>' + @MyXMLString + '</Rows>' set @EndTime = CURRENT_TIMESTAMP print @MyXMLString select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime) -- prove string works declare @MyXML XML set @MyXML = @MyXMLString select i.item.value('AccountNumber[1]','nvarchar(15)') from @MyXML.nodes('/Rows/Row')AS i(item) GO
The creation of this string takes 1,966,306 ms… or just over 32.75 minutes.
By using XML in SQL Server, you can dramatically improve on this time:
-- build XML string using SQL FOR XML use AdventureWorks GO declare @StartTime datetime, @EndTime datetime declare @MyXMLString XML set @StartTime = CURRENT_TIMESTAMP select @MyXMLString = (select AccountNumber from Sales.SalesOrderHeader FOR XML RAW('Row'),ROOT('Rows'),ELEMENTS ) set @EndTime = CURRENT_TIMESTAMP select @MyXMLString, @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime) -- prove string works select i.item.value('AccountNumber[1]','nvarchar(15)') from @MyXMLString.nodes('/Rows/Row')AS i(item) GO
The ROOT clause specifies to add the specified text as the top-level element (root) of the xml string. The RAW clause specifies to use the specified text as the element for each record. The ELEMENTS tag specifies to put each column as a separate xml sub-element. The output of this select statement is the exact same XML string as the previous example, and this takes only 260ms. Hmmm… over 32 minutes, or under 1/3 of a second? I know which one I’d prefer using!
2. What about creating a comma-delimited list?
This way adds a comma plus the column row by row to a string, and finally removes the beginning comma.
-- build CSV string using string manipulation use AdventureWorks GO declare @CSV varchar(max), @StartTime datetime, @EndTime datetime set @StartTime = CURRENT_TIMESTAMP set @CSV = '' select @CSV = @CSV + ',' + AccountNumber from Sales.SalesOrderHeader set @CSV = substring(@CSV, 2, len(@CSV)) set @EndTime = CURRENT_TIMESTAMP print @CSV select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime) GO This takes 216,393 ms… or just over 3.5 minutes. Using XML: -- build CSV string using SQL FOR XML use AdventureWorks GO declare @CSV varchar(max), @StartTime datetime, @EndTime datetime set @StartTime = CURRENT_TIMESTAMP select @CSV = (select ',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') ) set @CSV = substring(@CSV, 2, len(@CSV)) set @EndTime = CURRENT_TIMESTAMP print @CSV select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime) GO
By not using the ROOT clause, the top-level element (root) tag is not added. By specifying an empty string for the PATH clause, no wrapper element for the row is generated. By not using the ELEMENTS clause, sub-elements are not generated. What gets returned is just what is in the select clause – in this case the column prefixed with a comma. All that’s left is to remove the leading comma. This way takes 40 ms. Another remarkable difference!
3. And how about parsing a comma-delimited list? This usually involves a function that will strip a comma-delimited list into separate parts, putting the parts into a table that can then be used. These are adequate for small strings, but how do they compare against a large string? An example of such a function is (obtained off of the internet):
USE AdventureWorks GO if object_id('dbo.fnMVParam')is not null DROP FUNCTION [dbo].[fnMVParam] GO CREATE FUNCTION [dbo].[fnMVParam](@repparam nvarchar(max), @delim char(1)= ',') returns @values table (paramnvarchar(max)) as begin declare @chrind int declare @piece nvarchar(max) select @chrind = 1 while @chrind > 0 begin select @chrind = charindex(@delim, @repparam) if @chrind > 0 select @piece =ltrim(left(@repparam, @chrind - 1)) else select @piece =ltrim(@repparam) if @piece is not null insert @values(param)values(@piece) select @repparam = right(@repparam, len(@repparam)- @chrind) iflen(@repparam)= 0 break end return end GO When using this function: use AdventureWorks GO -- create comma-delimited string with above XML method declare @CSV varchar(max), @StartTime datetime, @EndTime datetime select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') ) set @CSV = substring(@CSV, 2, len(@CSV)-1) set @StartTime = CURRENT_TIMESTAMP select * from dbo.fnMVParam(@CSV,',') set @EndTime = CURRENT_TIMESTAMP select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime) if object_id('dbo.fnMVParam')is not null DROP FUNCTION [dbo].[fnMVParam] GO
This method takes 180,580 ms… or just over 3 minutes. Using XML:
use AdventureWorks GO -- create comma-delimted string with above XML method declare @CSV varchar(max), @StartTime datetime, @EndTime datetime select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') ) set @CSV = substring(@CSV, 2, len(@CSV)-1) -- convert the CSV string into a valid XML string set @StartTime = CURRENT_TIMESTAMP declare @MyXMLData XML -- replace special XML characters that cause issues in SQL set @CSV = replace(replace(@CSV,'&', '&'),'<', '<') set @MyXMLData = '<Rows><Row><AccountNumber>'+ replace(@CSV,',','</AccountNumber></Row><Row><AccountNumber>')+ '</AccountNumber></Row></Rows>' select x.item.value('AccountNumber[1]','nvarchar(15)') from @MyXMLData.nodes('/Rows/Row')AS x(item) set @EndTime = CURRENT_TIMESTAMP select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime) GO
What’s happening here is we have to replace any XML “special” characters with the code used to represent them. In my testing, only these two (ampersand and less-than) need to be replaced; the other special characters (single-quote, double-quote, and greater-than) do not cause any issues with SQL. We then need to replace all of the comma delimiters with the ending column/row element tags, and then start the next row/column element tags. Finally, the root, row, and element starting tags need to be added to the beginning and end of the string for the first and last values in the string. This method takes 1540 ms… or just over 1.5 seconds. Here we have yet another major improvement.
As you can see, performing string manipulations on a large string can be very time consuming. Furthermore, I sure that there are ways that are more efficient than what I’ve used above. However, many of the string manipulations can now be performed using XML in a fraction of the time.
In conclusion, learning how to work with XML on strings in SQL can be very beneficial to the performance of your systems. For more information about using XML, check out the MSDN SQL Server 2005 BOL article for the FOR XML clause of the SELECT statement.