Tally OH! An Improved SQL 8K “CSV Splitter” Function

Question

Tally OH! An Improved SQL 8K “CSV Splitter” Function

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

SQL Fool SSC Journeyman Points: 92 More actions · Answer 1

First off, this was a great article followed by an awesome thread… great suggestions, dialog, debates and explanations. Even though Jeff was kind enough to update the article on 5/12/2011, it seems like there has been some good suggestions since then. Unfortunately, in the 53 pages of posts, I've lost track of the "latest version". Could someone point me to the best post? Or, is the 5/12/2011 update the best.

Steve Pirazzi
ONEWARE, Inc.
http://www.ONEWARE.com

mister.magoo SSC-Forever Points: 47174 More actions · Answer 2

ErikEckhardt (2/22/2013)
mister.magoo (2/22/2013)
WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN
COLLATE only needs to be on one side of the expression. Either side will do. You don't have to put it on both sides.

Yes, you are right. I added it one time too many, but as it was there when I ran the tests, I thought it best to leave it. I am sure that Jeff would remove the extra one...

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Usman Butt Hall of Fame Points: 3186 More actions · Answer 3

L' Eomot Inversé (2/22/2013)
mister.magoo (2/22/2013)
What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?
Always makes me nervous when that is the case...
That feeling was bothering me too - I can't understand why I didn't spot the opportunity the first time I looked at the code, let alone why Jeff or Paul or another of the top experts who contribute so much to SQLServerCentral didn't do it long ago.
Looks as if you're the only one who's awake around here, Mr M. 😎

IIRC, Jeff Moden already knows it. But I guess the conversation we had somewhere else was something like that he likes to use the default collation. (I may be wrong. So I sincerely apologize in advance in case I said anything/everything wrong).

But in my case, the binary collated splitter version is already in production for comma delimeted strings 😉

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 4

L' Eomot Inversé (2/22/2013)
mister.magoo (2/22/2013)
What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?
Always makes me nervous when that is the case...
That feeling was bothering me too - I can't understand why I didn't spot the opportunity the first time I looked at the code, let alone why Jeff or Paul or another of the top experts who contribute so much to SQLServerCentral didn't do it long ago.
Looks as if you're the only one who's awake around here, Mr M. 😎

Actually, the answer is embarrasingly simple. I flat out forgot. :blush:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mister.magoo SSC-Forever Points: 47174 More actions · Answer 5

Usman Butt (2/22/2013)
L' Eomot Inversé (2/22/2013)
mister.magoo (2/22/2013)
What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?
Always makes me nervous when that is the case...
That feeling was bothering me too - I can't understand why I didn't spot the opportunity the first time I looked at the code, let alone why Jeff or Paul or another of the top experts who contribute so much to SQLServerCentral didn't do it long ago.
Looks as if you're the only one who's awake around here, Mr M. 😎
IIRC, Jeff Moden already knows it. But I guess the conversation we had somewhere else was something like that he likes to use the default collation. (I may be wrong. So I sincerely apologize in advance in case I said anything/everything wrong).
But in my case, the binary collated splitter version is already in production for comma delimeted strings 😉

See, I knew someone must have done it already

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

Usman Butt Hall of Fame Points: 3186 More actions · Answer 6

mister.magoo (2/22/2013)
Usman Butt (2/22/2013)
L' Eomot Inversé (2/22/2013)
mister.magoo (2/22/2013)
What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?
Always makes me nervous when that is the case...
That feeling was bothering me too - I can't understand why I didn't spot the opportunity the first time I looked at the code, let alone why Jeff or Paul or another of the top experts who contribute so much to SQLServerCentral didn't do it long ago.
Looks as if you're the only one who's awake around here, Mr M. 😎
IIRC, Jeff Moden already knows it. But I guess the conversation we had somewhere else was something like that he likes to use the default collation. (I may be wrong. So I sincerely apologize in advance in case I said anything/everything wrong).
But in my case, the binary collated splitter version is already in production for comma delimeted strings 😉

And to supplement my earlier post, please read posts from here onwards. I guess both of our top experts Jeff Moden and Paul White already knows it very well? :hehe:

TomThomson SSC Guru Points: 104768 More actions · Answer 7

Usman Butt (2/22/2013)
And to supplement my earlier post, please read posts from here onwards. I guess both of our top experts Jeff Moden and Paul White already knows it very well? :hehe:

Ouch. Less than 6 months ago and I already forgot!

Tom

peter-757102 SSCertifiable Points: 6877 More actions · Answer 8

Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

peter-757102 (2/26/2013)
Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.

The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Dwain Camps SSC Guru Points: 86978 More actions · Answer 10

ChrisM@Work (2/26/2013)
peter-757102 (2/26/2013)
Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.
The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.

Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?

I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 11

dwain.c (2/26/2013)
ChrisM@Work (2/26/2013)
peter-757102 (2/26/2013)
Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.
The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.
Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?
I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:

It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:

Latin1_General_BIN_:s

Latin1_General_BIN_:]

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

mister.magoo SSC-Forever Points: 47174 More actions · Answer 12

ChrisM@Work (2/26/2013)
dwain.c (2/26/2013)
ChrisM@Work (2/26/2013)
peter-757102 (2/26/2013)
Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.
The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.
Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?
I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:
It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:
Latin1_General_BIN_:s
Latin1_General_BIN_:]

Surely you mean that Latin1_General_BIN is case sensitive (CS) and accent sensitive (AS) ?

Binary collation will definitely find 'a' and 'A' to be different and 'á' to be different to 'a'.

That is part of the reason it is quicker for CHARINDEX and equality testing - it doesn't have to compare 'a' to 'A','á','Á' etc... it is either 'a' or NOT 'a'.

I do agree that it is smirk sensitive though :o) != :O)

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 13

mister.magoo (2/26/2013)
ChrisM@Work (2/26/2013)
dwain.c (2/26/2013)
ChrisM@Work (2/26/2013)
peter-757102 (2/26/2013)
Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.
The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.
Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?
I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:
It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:
Latin1_General_BIN_:s
Latin1_General_BIN_:]
Surely you mean that Latin1_General_BIN is case sensitive (CS) and accent sensitive (AS) ?
Binary collation will definitely find 'a' and 'A' to be different and 'á' to be different to 'a'.
That is part of the reason it is quicker for CHARINDEX and equality testing - it doesn't have to compare 'a' to 'A','á','Á' etc... it is either 'a' or NOT 'a'.
I do agree that it is smirk sensitive though :o) != :O)

Oops! Smirk in me eyes 😉 yes of course. "binary" is a bit of a giveaway.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Dwain Camps SSC Guru Points: 86978 More actions · Answer 14

ChrisM@Work (2/26/2013)
mister.magoo (2/26/2013)
ChrisM@Work (2/26/2013)
dwain.c (2/26/2013)
ChrisM@Work (2/26/2013)
peter-757102 (2/26/2013)
Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.
The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.
Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?
I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:
It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:
Latin1_General_BIN_:s
Latin1_General_BIN_:]
Surely you mean that Latin1_General_BIN is case sensitive (CS) and accent sensitive (AS) ?
Binary collation will definitely find 'a' and 'A' to be different and 'á' to be different to 'a'.
That is part of the reason it is quicker for CHARINDEX and equality testing - it doesn't have to compare 'a' to 'A','á','Á' etc... it is either 'a' or NOT 'a'.
I do agree that it is smirk sensitive though :o) != :O)
Oops! Smirk in me eyes 😉 yes of course. "binary" is a bit of a giveaway.

Doh! (Homer Simpson moment) 😛

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 15

dwain.c (2/26/2013)
ChrisM@Work (2/26/2013)
mister.magoo (2/26/2013)
ChrisM@Work (2/26/2013)
dwain.c (2/26/2013)
ChrisM@Work (2/26/2013)
peter-757102 (2/26/2013)
Using a binary comparisons is a good speedup for specific cases and good to be aware of and certainly worth mentioning in bold with its benefits. What it is not however is a good in a generic solution that people can copy paste and directly use without unexpected side-effects. Collations do not exist for nothing and most of the searches in a database will be case insensitve for valid reasons.
The collation changes to DS8K suggested by Mister Magoo are all internal to the function and are not inherited by (or retained by) the output columns. Good point though - the output of multiple nested REPLACE accelerated by applying a binary collation will retain the new collation cast and could affect subsequent operations.
Does this mean though, that internally the BIN collation sequence should be either CS or CI, depending on what the default is for the database?
I always use Latin1_General_BIN and now I'm not sure if that is CS or CI! :w00t::hehe:
It's neither case nor accent sensitive but there's evidence here that it's smirk-sensitive:
Latin1_General_BIN_:s
Latin1_General_BIN_:]
Surely you mean that Latin1_General_BIN is case sensitive (CS) and accent sensitive (AS) ?
Binary collation will definitely find 'a' and 'A' to be different and 'á' to be different to 'a'.
That is part of the reason it is quicker for CHARINDEX and equality testing - it doesn't have to compare 'a' to 'A','á','Á' etc... it is either 'a' or NOT 'a'.
I do agree that it is smirk sensitive though :o) != :O)
Oops! Smirk in me eyes 😉 yes of course. "binary" is a bit of a giveaway.
Doh! (Homer Simpson moment) 😛

About once a week, mate. Now I'm an old git I can blame it on declining memory. When I was a sprightly youngster it was more embarrassing.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden