SQL Split query

Question

SQL Split query

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

Mario Garcia (7/6/2009)
I have used a function to parse data. It can be altered some to fit your needs.

You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.

Here is a post with one example of an alternative method as well.

WayneS SSC Guru Points: 95460 More actions · Answer 2

Lynn Pettis (7/6/2009)
Mario Garcia (7/6/2009)
I have used a function to parse data. It can be altered some to fit your needs.
You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.
Here is a post with one example of an alternative method as well.

I'd recommend reading that entire thread... there were several ways brought up, and all of them would be significantly better than any using a while loop.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

WayneS (7/6/2009)
Lynn Pettis (7/6/2009)
Mario Garcia (7/6/2009)
I have used a function to parse data. It can be altered some to fit your needs.
You may want to search this site a bit. You may find several split routines that will scale better than your current solution. The biggest issue I can see with your code is the WHILE loop you are using to accomplish the split.
Here is a post with one example of an alternative method as well.
I'd recommend reading that entire thread... there were several ways brought up, and all of them would be significantly better than any using a while loop.

I concur with Wayne on this as well. I just didn't say it as my previous post was the third attempt at getting a post to work. Not sure, but something is screwy here at work as I don't have these issues at home.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 4

Here is a 48-page long topic about a various attempts to write a good and decent split function

http://www.sqlservercentral.com/Forums/Topic695508-338-1.aspx

N 56°04'39.16"
E 12°55'05.25"

sanjeev40084 SSC-Addicted Points: 449 More actions · Answer 5

see if this works:

http://sqlservercode.blogspot.com/2005/09/split-comma-delimited-string-fast.html

Jeff Moden SSC Guru Points: 1004360 More actions · Answer 6

sanjeev40084 (7/6/2009)

see if this works:

http://sqlservercode.blogspot.com/2005/09/split-comma-delimited-string-fast.html%5B/quote%5D

Heh... that blog contains the same method we've already talked about in this thread.... except it's usually just called a "Numbers" or "Tally" table instead of "NumbersPivot".

Since many demonstrate it's usage but few explain how it actually works, please see the following article to find out...

http://www.sqlservercentral.com/articles/T-SQL/62867/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

nathan.syverson Old Hand Points: 388 More actions · Answer 7

I found this a while back. I'm sure you can google the coder's name and see what terms it's released under.

CREATE FUNCTION [dbo].[fn_SplitAndRemovePunctuation](@s VARCHAR(8000))

returns @out TABLE (wrd VARCHAR(8000))

AS

BEGIN

/**********************************************************

Written By : yousef ekhtiari

Email :yousef_ekhtiari@Hotmail.com

Create Date : 24 May 2007

Modified Date :

Description : Extracting words of a string with removal

of unwanted punctuations

USAGE:

select * from dbo.ufnSplitWords('This is a test.')

select * from dbo.ufnSplitWords('This,, is a test')

select * from dbo.ufnSplitWords('Do you need more example?')

**********************************************************/

DECLARE @pos INT,

@Tmp VARCHAR(8000),

@wrd VARCHAR(50)

DECLARE @UnwantedPunctuations TABLE (punc CHAR(1))

/*keep unwanted punctuations in a table.

you may customise these punctuation as you wish */

INSERT @UnwantedPunctuations

SELECT '.'

UNION ALL

SELECT ','

UNION ALL

SELECT '?'

UNION ALL

SELECT ':'

UNION ALL

SELECT '-'

--Removing unwanted punctuations from the input string

UPDATE @UnwantedPunctuations

SET @s-2 =REPLACE(@s,punc,' ')

SET @s-2 =ltrim(rtrim(@s))+' '

WHILE LEN(@s)>0

BEGIN

SET @pos=CHARINDEX(' ',@s,2)

SET @wrd=ltrim(LEFT(@s,@pos))

--Removing unwanted punctuations

UPDATE @UnwantedPunctuations

SET @wrd=REPLACE(@wrd,punc,'')

INSERT @out VALUES

(

rtrim(@wrd)

)

--Take out the last word

SET @s-2=ltrim ( stuff(@s,1,@pos ,'') )

END

RETURN

END

Jeff Moden SSC Guru Points: 1004360 More actions · Answer 8

Sorry... wrong thread.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004360 More actions · Answer 9

Heh... dang it... I had the right thread after all. That code above is slow. Where all the other methods take something less than 16 ms to split out 534 elements in a single VARCHAR(8000), the one with the While Loop and the bad character delimiter takes 250 ms cpu and over a 500 ms in duration.

Here's the run results...

[font="Courier New"]---- XML ----

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 10 ms.

(534 row(s) affected)

SQL Server Execution Times:

CPU time = 15 ms, elapsed time = 182 ms.

---- XML2 ----

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 6 ms.

(534 row(s) affected)

SQL Server Execution Times:

CPU time = 16 ms, elapsed time = 130 ms.

---- TALLY ----

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

(534 row(s) affected)

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 12 ms.

---- other ----

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

(534 row(s) affected)

SQL Server Execution Times:

CPU time = 234 ms, elapsed time = 547 ms.

[/font]

I suppose you can try to justify it by saying it'll never get used for something like that... right up until the time someone needs it for something like that. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95460 More actions · Answer 10

Jeff,

I'm pretty sure I know what your Tally table test is doing... but what is the difference between your "XML" and "XML2" tests? Would you be able to post your code for those tests?

Thanks (and always curious about what Jeff does in the name of performance;-)),

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95460 More actions · Answer 11

Sorry, post appeared to hang and accidentally double-posted.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

nathan.syverson Old Hand Points: 388 More actions · Answer 12

Yeah, I noticed that it wasn't the most efficient SQL I've seen. 🙂 Thanks for the info.

Jeff Moden SSC Guru Points: 1004360 More actions · Answer 13

WayneS (7/6/2009)
Jeff,
I'm pretty sure I know what your Tally table test is doing... but what is the difference between your "XML" and "XML2" tests? Would you be able to post your code for those tests?
Thanks (and always curious about what Jeff does in the name of performance;-)),

The XML2 method doesn't always perform better than the XML method. I threw in XML3 in the code below, as well, but it doesn't always perform better, either. The Tally table, on the other hand, always performs better on up to 8k. If you want a real kick, try the side bar I mention in the very last comment... Surprise! 😉

[font="Courier New"]--===== Declare local variables

DECLARE @CSV VARCHAR(8000),

@xCSV VARCHAR(MAX), -- To allow expansion of XML in tests

@MyXMLData XML -- Holds xml string version

--===== Create space-delimted string with XML PATH('') method

SELECT @CSV = STUFF(

(SELECT ' ' + AccountNumber FROM AdventureWorks.Sales.SalesOrderHeader FOR XML PATH(''))

,1,1,''),

@CSV = REPLACE(@CSV,'-',''), --Remove dashes just to simplify the tests

@xCSV = @CSV --Copy of @CSV for expansion of XML in tests

--===== Start the timers and do the tests...

SET STATISTICS TIME ON

--===== Traditional XML split (verbosity makes it a bit slower than it should be)

PRINT '---- XML ----'

-- Convert the CSV string into a valid XML string (it's mostly case sensitive, too)

SET @MyXMLData = '<Rows><Row><AccountNumber>'

+ REPLACE(@xCSV,' ','</AccountNumber></Row><Row><AccountNumber>')

+ '</AccountNumber></Row></Rows>'

-- Do the split (most of this is case sensitive as well

SELECT x.item.value('AccountNumber[1]','NVARCHAR(15)') AS SplitData

FROM @MyXMLData.nodes('/Rows/Row') AS x(Item)

--===== Another XML split but usually a tiny bit faster because it's less verbose, yet still easy to read

PRINT '---- XML2 ----'

-- Convert the CSV string into a valid XML string (again, mostly case sensitive)

SET @MyXMLData = '<T><R><C>'

+ REPLACE(@xcsv,' ','</C></R><R><C>')

+ '</C></R></T>'

-- Do the split (most of this is case sensitive as well

SELECT a.item.value('C[1]','NVARCHAR(15)') AS SplitData

FROM @MyXMLData.nodes('/T/R')AS a(item)

--===== Another XML split but usually a bit faster still because it's much less verbose, yet still easy to read.

PRINT '---- XML3 ----'

-- Convert the CSV string into a valid XML string (again, mostly case sensitive)

SET @MyXMLData = '<X>'

+ REPLACE(@xCSV,' ','</X><X>')

+ '</X>'

-- Do the split (most of this is case sensitive as well

SELECT x.item.value('.','NVARCHAR(15)') AS SplitData

FROM @MyXMLData.nodes('X') AS x(item)

--===== Traditional Tally table split doesn't require expansion into VARCHAR(MAX).

-- It also handles special characters like "&" which are a bit tough on XML.

-- Further, for splitting up to VARCHAR(8000), it's faster than the other methods.

PRINT '---- TALLY ----'

SELECT SUBSTRING(' ' + @CSV, N+1, CHARINDEX(' ', @CSV+' ', N)-N) AS SplitData

FROM dbo.Tally

WHERE N <= LEN(' ' + @CSV)

AND SUBSTRING(' ' + @CSV, N, 1) = ' '

--===== End of testing

SET STATISTICS TIME OFF

--===== Sidebar... if you want to see how bad XML can be, try inserting the

-- result set into a table[/font]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95460 More actions · Answer 14

Jeff Moden (7/8/2009)
If you want a real kick, try the side bar I mention in the very last comment... Surprise! 😉

Well, after testing it, I don't know what to say. I'm astonished at the huge difference. Jeez...

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2