July 6, 2009 at 8:07 am
Mario Garcia (7/6/2009)
I have used a function to parse data. It can be altered some to fit your needs.
You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.
Here is a post with one example of an alternative method as well.
July 6, 2009 at 8:12 am
Lynn Pettis (7/6/2009)
Mario Garcia (7/6/2009)
I have used a function to parse data. It can be altered some to fit your needs.You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.
Here is a post with one example of an alternative method as well.
I'd recommend reading that entire thread... there were several ways brought up, and all of them would be significantly better than any using a while loop.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 6, 2009 at 9:12 am
WayneS (7/6/2009)
Lynn Pettis (7/6/2009)
Mario Garcia (7/6/2009)
I have used a function to parse data. It can be altered some to fit your needs.You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.
Here is a post with one example of an alternative method as well.
I'd recommend reading that entire thread... there were several ways brought up, and all of them would be significantly better than any using a while loop.
I concur with Wayne on this as well. I just didn't say it as my previous post was the third attempt at getting a post to work. Not sure, but something is screwy here at work as I don't have these issues at home.
July 6, 2009 at 10:22 am
Here is a 48-page long topic about a various attempts to write a good and decent split function
http://www.sqlservercentral.com/Forums/Topic695508-338-1.aspx
N 56°04'39.16"
E 12°55'05.25"
July 6, 2009 at 1:35 pm
July 6, 2009 at 3:12 pm
sanjeev40084 (7/6/2009)
http://sqlservercode.blogspot.com/2005/09/split-comma-delimited-string-fast.html%5B/quote%5D
Heh... that blog contains the same method we've already talked about in this thread.... except it's usually just called a "Numbers" or "Tally" table instead of "NumbersPivot".
Since many demonstrate it's usage but few explain how it actually works, please see the following article to find out...
http://www.sqlservercentral.com/articles/T-SQL/62867/
--Jeff Moden
Change is inevitable... Change for the better is not.
July 6, 2009 at 6:20 pm
I found this a while back. I'm sure you can google the coder's name and see what terms it's released under.
CREATE FUNCTION [dbo].[fn_SplitAndRemovePunctuation](@s VARCHAR(8000))
returns @out TABLE (wrd VARCHAR(8000))
AS
BEGIN
/**********************************************************
Written By : yousef ekhtiari
Email :yousef_ekhtiari@Hotmail.com
Create Date : 24 May 2007
Modified Date :
Description : Extracting words of a string with removal
of unwanted punctuations
USAGE:
select * from dbo.ufnSplitWords('This is a test.')
select * from dbo.ufnSplitWords('This,, is a test')
select * from dbo.ufnSplitWords('Do you need more example?')
**********************************************************/
DECLARE @pos INT,
@Tmp VARCHAR(8000),
@wrd VARCHAR(50)
DECLARE @UnwantedPunctuations TABLE (punc CHAR(1))
/*keep unwanted punctuations in a table.
you may customise these punctuation as you wish */
INSERT @UnwantedPunctuations
SELECT '.'
UNION ALL
SELECT ','
UNION ALL
SELECT '?'
UNION ALL
SELECT ':'
UNION ALL
SELECT '-'
--Removing unwanted punctuations from the input string
UPDATE @UnwantedPunctuations
SET @s-2 =REPLACE(@s,punc,' ')
SET @s-2 =ltrim(rtrim(@s))+' '
WHILE LEN(@s)>0
BEGIN
SET @pos=CHARINDEX(' ',@s,2)
SET @wrd=ltrim(LEFT(@s,@pos))
--Removing unwanted punctuations
UPDATE @UnwantedPunctuations
SET @wrd=REPLACE(@wrd,punc,'')
INSERT @out VALUES
(
rtrim(@wrd)
)
--Take out the last word
SET @s-2=ltrim ( stuff(@s,1,@pos ,'') )
END
RETURN
END
July 6, 2009 at 8:50 pm
Sorry... wrong thread.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 6, 2009 at 9:02 pm
Heh... dang it... I had the right thread after all. That code above is slow. Where all the other methods take something less than 16 ms to split out 534 elements in a single VARCHAR(8000), the one with the While Loop and the bad character delimiter takes 250 ms cpu and over a 500 ms in duration.
Here's the run results...
[font="Courier New"]---- XML ----
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 10 ms.
(534 row(s) affected)
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 182 ms.
---- XML2 ----
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 6 ms.
(534 row(s) affected)
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 130 ms.
---- TALLY ----
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(534 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 12 ms.
---- other ----
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 1 ms.
(534 row(s) affected)
SQL Server Execution Times:
CPU time = 234 ms, elapsed time = 547 ms.
[/font]
I suppose you can try to justify it by saying it'll never get used for something like that... right up until the time someone needs it for something like that. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
July 6, 2009 at 9:20 pm
Jeff,
I'm pretty sure I know what your Tally table test is doing... but what is the difference between your "XML" and "XML2" tests? Would you be able to post your code for those tests?
Thanks (and always curious about what Jeff does in the name of performance;-)),
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 6, 2009 at 9:25 pm
Sorry, post appeared to hang and accidentally double-posted.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
July 7, 2009 at 11:46 am
Yeah, I noticed that it wasn't the most efficient SQL I've seen. 🙂 Thanks for the info.
July 8, 2009 at 9:05 pm
WayneS (7/6/2009)
Jeff,I'm pretty sure I know what your Tally table test is doing... but what is the difference between your "XML" and "XML2" tests? Would you be able to post your code for those tests?
Thanks (and always curious about what Jeff does in the name of performance;-)),
The XML2 method doesn't always perform better than the XML method. I threw in XML3 in the code below, as well, but it doesn't always perform better, either. The Tally table, on the other hand, always performs better on up to 8k. If you want a real kick, try the side bar I mention in the very last comment... Surprise! 😉
[font="Courier New"]--===== Declare local variables
DECLARE @CSV VARCHAR(8000),
@xCSV VARCHAR(MAX), -- To allow expansion of XML in tests
@MyXMLData XML -- Holds xml string version
--===== Create space-delimted string with XML PATH('') method
SELECT @CSV = STUFF(
(SELECT ' ' + AccountNumber FROM AdventureWorks.Sales.SalesOrderHeader FOR XML PATH(''))
,1,1,''),
@CSV = REPLACE(@CSV,'-',''), --Remove dashes just to simplify the tests
@xCSV = @CSV --Copy of @CSV for expansion of XML in tests
--===== Start the timers and do the tests...
SET STATISTICS TIME ON
--===== Traditional XML split (verbosity makes it a bit slower than it should be)
PRINT '---- XML ----'
-- Convert the CSV string into a valid XML string (it's mostly case sensitive, too)
SET @MyXMLData = '<Rows><Row><AccountNumber>'
+ REPLACE(@xCSV,' ','</AccountNumber></Row><Row><AccountNumber>')
+ '</AccountNumber></Row></Rows>'
-- Do the split (most of this is case sensitive as well
SELECT x.item.value('AccountNumber[1]','NVARCHAR(15)') AS SplitData
FROM @MyXMLData.nodes('/Rows/Row') AS x(Item)
--===== Another XML split but usually a tiny bit faster because it's less verbose, yet still easy to read
PRINT '---- XML2 ----'
-- Convert the CSV string into a valid XML string (again, mostly case sensitive)
SET @MyXMLData = '<T><R><C>'
+ REPLACE(@xcsv,' ','</C></R><R><C>')
+ '</C></R></T>'
-- Do the split (most of this is case sensitive as well
SELECT a.item.value('C[1]','NVARCHAR(15)') AS SplitData
FROM @MyXMLData.nodes('/T/R')AS a(item)
--===== Another XML split but usually a bit faster still because it's much less verbose, yet still easy to read.
PRINT '---- XML3 ----'
-- Convert the CSV string into a valid XML string (again, mostly case sensitive)
SET @MyXMLData = '<X>'
+ REPLACE(@xCSV,' ','</X><X>')
+ '</X>'
-- Do the split (most of this is case sensitive as well
SELECT x.item.value('.','NVARCHAR(15)') AS SplitData
FROM @MyXMLData.nodes('X') AS x(item)
--===== Traditional Tally table split doesn't require expansion into VARCHAR(MAX).
-- It also handles special characters like "&" which are a bit tough on XML.
-- Further, for splitting up to VARCHAR(8000), it's faster than the other methods.
PRINT '---- TALLY ----'
SELECT SUBSTRING(' ' + @CSV, N+1, CHARINDEX(' ', @CSV+' ', N)-N) AS SplitData
FROM dbo.Tally
WHERE N <= LEN(' ' + @CSV)
AND SUBSTRING(' ' + @CSV, N, 1) = ' '
--===== End of testing
SET STATISTICS TIME OFF
--===== Sidebar... if you want to see how bad XML can be, try inserting the
-- result set into a table[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
July 8, 2009 at 10:05 pm
Jeff Moden (7/8/2009)
If you want a real kick, try the side bar I mention in the very last comment... Surprise! 😉
Well, after testing it, I don't know what to say. I'm astonished at the huge difference. Jeez...
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply