Tally OH! An Improved SQL 8K “CSV Splitter” Function

Question

Tally OH! An Improved SQL 8K “CSV Splitter” Function

send separately the original number of elements (the diff between that and the result from the Split() would be the number of NULL elements)
don't allow the source string fields to be NULL
do an ISNULL / COALESCE such that NULLs are converted to an empty string or some other reserved designater that the down-stream code knows how to interpret

Basically, in terms of SQL Server (and not Oracle ;-)), splitting a string can only return a NULL if the input itself is NULL.

Take care,

Solomon..

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Eirikur Eiriksson SSC Guru Points: 182925 More actions · Answer 1

Jason-368451 (8/6/2014)
I have a CSV file that I bulk loaded into a single column staging table.
I need a way to parse proper csv files that are comma delimited with optional quote text qualifiers
as the data may have embedded commas.
i.e. Jason,"900 N. May ST., #5"
using the Jeff's splitter it would get split into 3 columns i.e. Jason as column 1 900 N. May ST. as column 2 and #5 as column 3
I need it to keep the address together when there are quotes text qualifiers .
I was able to get the function from this article to work on my dev machine "Reaping the benefits of the Window functions in T-SQL"
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903
But I need to deploy this to a server running 2008 and that function uses syntax that is only valid in 2012.
I would appreciate any help.

Have you looked at Phil's article, The TSQL of CSV: Comma-Delimited of Errors[/url]? He explores some alternatives there.

I haven't created a 2K8 version of the CSV parsing code in the article as it uses a running total for marking delimiters active/inactive, quite inefficient without the window functions.

Jeff Moden SSC Guru Points: 1004448 More actions · Answer 2

Jason-368451 (8/6/2014)
I have a CSV file that I bulk loaded into a single column staging table.
I need a way to parse proper csv files that are comma delimited with optional quote text qualifiers
as the data may have embedded commas.
i.e. Jason,"900 N. May ST., #5"
using the Jeff's splitter it would get split into 3 columns i.e. Jason as column 1 900 N. May ST. as column 2 and #5 as column 3
I need it to keep the address together when there are quotes text qualifiers .
I was able to get the function from this article to work on my dev machine "Reaping the benefits of the Window functions in T-SQL"
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903
But I need to deploy this to a server running 2008 and that function uses syntax that is only valid in 2012.
I would appreciate any help.

The "easy" way I handle this at work is to run it through Import-CSV and pipe it to Export-CSV to another file and use BCP with a format file to do the imports based on the compound delimiters of "," and the leading and trailing quote. It's nasty fast.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 3

Jason-368451 (8/6/2014)
I have a CSV file that I bulk loaded into a single column staging table.
I need a way to parse proper csv files that are comma delimited with optional quote text qualifiers
as the data may have embedded commas.
i.e. Jason,"900 N. May ST., #5"
using the Jeff's splitter it would get split into 3 columns i.e. Jason as column 1 900 N. May ST. as column 2 and #5 as column 3
I need it to keep the address together when there are quotes text qualifiers .
I was able to get the function from this article to work on my dev machine "Reaping the benefits of the Window functions in T-SQL"
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903
But I need to deploy this to a server running 2008 and that function uses syntax that is only valid in 2012.
I would appreciate any help.

Hey there. Another option is to use SQL CLR. I wrote a library called SQL#[/url] that contains a Stored Proc named SQL#.String_SplitResultIntoFields. This proc takes a single field from a result set and splits it. The split expression can accept Regular Expressions :-). Meaning, you can follow the example here and split your imported data in a way that accounts for both embedded commas and embedded text-qualifiers. The only thing that RegEx expression doesn't do is remove the text qualifiers from the field, but it might be possible to update the RegEx to do that. To be fair, this proc is in the Full version, not the Free version, but it does handle your situation quite cleanly. There is another proc called File_SplitIntoFields which reads directly from the import file so you can bypass BCP and its format files, but that option does not handle embedded newlines (but that might not be a problem anyway given those would probably break your existing import).

Take care,

Solomon..

EDIT:

The condensed RegEx from the link noted above is:

(?x)\s*,\s*(?=(?:[^"]*"[^"]*")*[^"]*$)

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

gbritton1 SSCertifiable Points: 6522 More actions · Answer 4

I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:

select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space

yields:

ItemNumberItem

1

2

So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?

Oh, fudge! I get it. It's two null strings separated by a blank.

My bad!!

Alan Burstein SSC Guru Points: 61141 More actions · Answer 5

gbritton1 (9/3/2014)
I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:
select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space
yields:
ItemNumberItem
1
2
So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?

It's two 0-length values separated by a blank. You will always get the number of delimiters+1.

DECLARE @txt1 varchar(100) = ',,',

@txt2 varchar(10) = ' ',

@txt3 varchar(10) = ' ';

select ItemNumber, Item, item_length = LEN(item)

from dbo.[DelimitedSplit8K](@txt1, ',') split; -- this will get you three blanks

select ItemNumber, Item, item_length = LEN(item)

from dbo.[DelimitedSplit8K](@txt2, ' ') split; -- this will get you 2 blanks

select ItemNumber, Item, item_length = LEN(item)

from dbo.[DelimitedSplit8K](@txt3, ' ') split; -- this will get you 2 3 blanks

Edit: noticed your edit after posting my reply. :w00t: Also fixed a type in my code comments.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Jeff Moden SSC Guru Points: 1004448 More actions · Answer 6

gbritton1 (9/3/2014)
I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:
select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space
yields:
ItemNumberItem
1
2
So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?
Oh, fudge! I get it. It's two null strings separated by a blank.
My bad!!

Z'actly!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 7

Jeff Moden (9/3/2014)
gbritton1 (9/3/2014)
I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:
select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space
yields:
ItemNumberItem
1
2
So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?
Oh, fudge! I get it. It's two null strings separated by a blank.
My bad!!
Z'actly!

Great Moden's Beard! What an epiphany!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Eirikur Eiriksson SSC Guru Points: 182925 More actions · Answer 8

gbritton1 (9/3/2014)
I'm using the splitter (just downloaded the zip file) and noticed something odd. Dunno if its just me, but check this out:
select * from dbo.[DelimitedSplit8K](' ', ' ') split -- input is one space, delimiter is one space
yields:
ItemNumberItem
1
2
So, I get two rows back from the splitter when I only expect one. What would need to change to handle this situation correctly? Or, is there a better way to split when the delimiter is a blank?
Oh, fudge! I get it. It's two null strings separated by a blank.
My bad!!

Not entirely your bad as the function doesn't return the implied NULL values, one can argue that the result set should be

ItemNumberItem

1NULL

2NULL

Jeff Moden SSC Guru Points: 1004448 More actions · Answer 9

I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 10

Jeff Moden (9/3/2014)
I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.

I assume you meant something like this?

SELECT ItemNumber, Item, NULLIF(Item, '')

FROM dbo.DelimitedSplit8K(' ', ' ');

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 11

Jeff Moden (9/3/2014)
I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.

I guess I'd have to agree with Jeff's disagreement, but probably disagree with any need to respect disagreement as it shouldn't be a controversial subject ;-).

First, if [space] is the delimiter and it just happens to be the only character in the string to be split, that is no different than splitting a string of ',' with a delimiter of ','; it is a set of two empty strings. A single empty string would not have any delimiter present in the string to split. And a NULL string to split would not return any rows at all (i.e. an empty result set).

Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.

Take care,

Solomon...

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 12

Solomon Rutzky (9/4/2014)
Jeff Moden (9/3/2014)
I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.
I guess I'd have to agree with Jeff's disagreement, but probably disagree with any need to respect disagreement as it shouldn't be a controversial subject ;-).
First, if [space] is the delimiter and it just happens to be the only character in the string to be split, that is no different than splitting a string of ',' with a delimiter of ','; it is a set of two empty strings. A single empty string would not have any delimiter present in the string to split. And a NULL string to split would return a NULL.
Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.
Take care,
Solomon...

Unless you happen to also work with Oracle. At this time Oracle treats an empty string as null. Messed me up for awhile until I read that in the documentation. It does say that this is subject to change in future versions of the product. This was Oracle 10 and 11.

Eirikur Eiriksson SSC Guru Points: 182925 More actions · Answer 13

Solomon Rutzky (9/4/2014)
Jeff Moden (9/3/2014)
I guess I'd have to disagree. Nulls mean "Unknown" and it is definitely known that there are no values on either side of the space delimiter so, IMHO, empty strings are the proper thing to return. I also respect disagreement on such a controversial subject so if you want NULLs, just wrap the ITEM being returned from the function in an ISNULL.
I guess I'd have to agree with Jeff's disagreement, but probably disagree with any need to respect disagreement as it shouldn't be a controversial subject ;-).
First, if [space] is the delimiter and it just happens to be the only character in the string to be split, that is no different than splitting a string of ',' with a delimiter of ','; it is a set of two empty strings. A single empty string would not have any delimiter present in the string to split. And a NULL string to split would return a NULL.
Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.
Take care,
Solomon...

Lets step back here, the function is used for presenting a delimited string data in a SQL Server tabular format, and the whole of the SQL Server unknown/missing logic (although somewhat thin as it does not differ between unknown, missing, no respond, empty, not applicable etc.) depends on NULLs. Now just think how much is going to fail down the line if suddenly NULLs become empty strings. It is no longer an unknown value but an empty value. This has nothing to do with the general NULL agreement/disagreement, it has everything to do with the logic we normally use. Bottom line is that if the value is not there it is truly unknown and in T-SQL or any SQL for that matter that is a NULL, only NULL and nothing (unintended pun here) but NULL.

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 14

Lynn Pettis (9/4/2014)
Solomon Rutzky (9/4/2014)
Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.
Unless you happen to also work with Oracle. At this time Oracle treats an empty string as null. Messed me up for awhile until I read that in the documentation. It does say that this is subject to change in future versions of the product. This was Oracle 10 and 11.

Thanks Lynn. That is interesting. It has been so long since I have worked with Oracle that I either forgot that or I forgot that I never knew it to begin with ;-). I did find this info which sheds a little light on that seemingly odd behavior.

Eirikur Eiriksson (9/4/2014)
Solomon Rutzky (9/4/2014)
Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.
Lets step back here, the function is used for presenting a delimited string data in a SQL Server tabular format, and the whole of the SQL Server unknown/missing logic (although somewhat thin as it does not differ between unknown, missing, no respond, empty, not applicable etc.) depends on NULLs. Now just think how much is going to fail down the line if suddenly NULLs become empty strings. It is no longer an unknown value but an empty value. This has nothing to do with the general NULL agreement/disagreement, it has everything to do with the logic we normally use. Bottom line is that if the value is not there it is truly unknown and in T-SQL or any SQL for that matter that is a NULL, only NULL and nothing (unintended pun here) but NULL.

Eirikur, I completely agree with the statements of "just think how much is going to fail down the line if suddenly NULLs become empty strings" and "if the value is not there it is truly unknown and ... that is a NULL, only NULL and nothing ... but NULL". This is why I agree with Jeff in disagreeing with the idea that the split function could, under some other interpretation, return NULL for any element. A NULL, as you stated, means "not there", hence could never have been part of the set that was joined together into a single, delimited string in the first place. If a set contains:

Bob

NULL

Sally

then the result of that set joined (as in String.Join(",")) together would be:

Bob,Sally

On the other hand, the following set:

Bob

{empty string}

Sally

would result in:

Bob,,Sally

Splitting either of those results on the comma would give you back the original set, with the exception, of course, of the missing element in the first set: the known unknown. If the reduced number of elements (due to NULLs not being represented in the CSV list) presents a problem for the operation, then either:

Sean Lange SSC Guru Points: 286590 More actions · Answer 15

Solomon Rutzky (9/4/2014)
Lynn Pettis (9/4/2014)
Solomon Rutzky (9/4/2014)
Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.
Unless you happen to also work with Oracle. At this time Oracle treats an empty string as null. Messed me up for awhile until I read that in the documentation. It does say that this is subject to change in future versions of the product. This was Oracle 10 and 11.
Thanks Lynn. That is interesting. It has been so long since I have worked with Oracle that I either forgot that or I forgot that I never knew it to begin with ;-). I did find this info which sheds a little light on that seemingly odd behavior.
Eirikur Eiriksson (9/4/2014)
Solomon Rutzky (9/4/2014)
Second, this shouldn't be controversial since a NULL concatenated into a non-NULL string will (well, should) result in a NULL. This behavior can be changed via SET CONCAT_NULL_YIELDS_NULL OFF, but the MSDN documentation even says that in a future version it will always be ON. Basically, NULL does not have a string representation since an empty string is a valid string with a length of 0.
Lets step back here, the function is used for presenting a delimited string data in a SQL Server tabular format, and the whole of the SQL Server unknown/missing logic (although somewhat thin as it does not differ between unknown, missing, no respond, empty, not applicable etc.) depends on NULLs. Now just think how much is going to fail down the line if suddenly NULLs become empty strings. It is no longer an unknown value but an empty value. This has nothing to do with the general NULL agreement/disagreement, it has everything to do with the logic we normally use. Bottom line is that if the value is not there it is truly unknown and in T-SQL or any SQL for that matter that is a NULL, only NULL and nothing (unintended pun here) but NULL.
Eirikur, I completely agree with the statements of "just think how much is going to fail down the line if suddenly NULLs become empty strings" and "if the value is not there it is truly unknown and ... that is a NULL, only NULL and nothing ... but NULL". This is why I agree with Jeff in disagreeing with the idea that the split function could, under some other interpretation, return NULL for any element. A NULL, as you stated, means "not there", hence could never have been part of the set that was joined together into a single, delimited string in the first place. If a set contains:
Bob
NULL
Sally
then the result of that set joined (as in String.Join(",")) together would be:
Bob,Sally
On the other hand, the following set:
Bob
{empty string}
Sally
would result in:
Bob,,Sally
Splitting either of those results on the comma would give you back the original set, with the exception, of course, of the missing element in the first set: the known unknown. If the reduced number of elements (due to NULLs not being represented in the CSV list) presents a problem for the operation, then either:
send separately the original number of elements (the diff between that and the result from the Split() would be the number of NULL elements)
don't allow the source string fields to be NULL
do an ISNULL / COALESCE such that NULLs are converted to an empty string or some other reserved designater that the down-stream code knows how to interpret
Basically, in terms of SQL Server (and not Oracle ;-)), splitting a string can only return a NULL if the input itself is NULL.
Take care,
Solomon..

According to your logic what would a splitter return if the input string had a trailing comma?

Bob,Sally,

By no means am I saying I disagree with your sentiments just wondering. Is this a 2 element list or 3? Is the NULL there or is it not? Is the third element an empty string, a NULL or not returned?

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Tally OH! An Improved SQL 8K “CSV Splitter” Function

Cookies on SQLServerCentral