Tally OH! An Improved SQL 8K “CSV Splitter” Function

Question

Tally OH! An Improved SQL 8K “CSV Splitter” Function

Viewing 15 posts - 946 through 960 (of 990 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 1

Eirikur Eiriksson - Tuesday, March 19, 2019 7:18 AM
Quick question, are there any multi-line entries or can one use the new line bytes as a row delimiter?

To whom are you directing that question?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182917 More actions · Answer 2

Jeff Moden - Tuesday, March 19, 2019 7:26 AM
Eirikur Eiriksson - Tuesday, March 19, 2019 7:18 AM
Quick question, are there any multi-line entries or can one use the new line bytes as a row delimiter?
To whom are you directing that question?

The OP

The spanner in the works is the multi row entries, big difference from handling single row entries

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 3

Eirikur Eiriksson - Tuesday, March 19, 2019 8:52 AM
Jeff Moden - Tuesday, March 19, 2019 7:26 AM
Eirikur Eiriksson - Tuesday, March 19, 2019 7:18 AM
Quick question, are there any multi-line entries or can one use the new line bytes as a row delimiter?
To whom are you directing that question?
The OP
The spanner in the works is the multi row entries, big difference from handling single row entries

Agreed but not much of one if each "record" has the same pattern of multi-line entries. I do use format files for such a thing if the multi-line entries are consistent.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

corey lawson Hall of Fame Points: 3773 More actions · Answer 4

Hey, Jeff, will you update this yet again to include the new split_string() function in SQL Server 2016+?

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 5

corey lawson wrote:

Hey, Jeff, will you update this yet again to include the new split_string() function in SQL Server 2016+?

Probably not. The STRING_SPLIT function is relatively crippled for those that need to know the position of the split out elements. That's been pretty well covered by Wayne Sheffield when it first came out. Here's the link...

https://www.sqlservercentral.com/articles/splitting-strings-in-sql-server-2016

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Collins SSCrazy Points: 2176 More actions · Answer 6

Jeff Moden wrote:

corey lawson wrote:
Hey, Jeff, will you update this yet again to include the new split_string() function in SQL Server 2016+?
Probably not. The STRING_SPLIT function is relatively crippled for those that need to know the position of the split out elements. That's been pretty well covered by Wayne Sheffield when it first came out. Here's the link...
https://www.sqlservercentral.com/articles/splitting-strings-in-sql-server-2016

Long story short, here's a schemabound ordinal splitter tvf which uses the string_split function as its base and then uses charindex with the "double separator append" trick. Maybe this is an efficient approach?

drop function if exists dbo.string_split_kvp;
go
create function dbo.string_split_kvp(
  @string           varchar(8000),
  @separator        char(1))
returns table with schemabinding as return
select row_number() over (order by 
            charindex(concat(@separator, [value], @separator),
                      concat(@separator, @string, @separator))) ,
       [value]
from string_split(@string, @separator);

Example

select * from dbo.string_split_kvp('this is a space separated string', ' ');

Output

keyvalue
1this
2is
3a
4space
5separated
6string

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

frederico_fonseca SSCoach Points: 16167 More actions · Answer 7

Steve Collins wrote:

Jeff Moden wrote:
corey lawson wrote:
Hey, Jeff, will you update this yet again to include the new split_string() function in SQL Server 2016+?
Probably not. The STRING_SPLIT function is relatively crippled for those that need to know the position of the split out elements. That's been pretty well covered by Wayne Sheffield when it first came out. Here's the link...
https://www.sqlservercentral.com/articles/splitting-strings-in-sql-server-2016
Long story short, here's a schemabound ordinal splitter tvf which uses the string_split function as its base and then uses charindex with the "double separator append" trick. Maybe this is an efficient approach?
drop function if exists dbo.string_split_kvp;
go
create function dbo.string_split_kvp(
  @string           varchar(8000),
  @separator        char(1))
returns table with schemabinding as return
select row_number() over (order by 
            charindex(concat(@separator, [value], @separator),
                      concat(@separator, @string, @separator))) ,
       [value]
from string_split(@string, @separator);
Example
select * from dbo.string_split_kvp('this is a space separated string', ' ');
Output
keyvalue
1this
2is
3a
4space
5separated
6string

apart from the performance that is likely to be quite bad it has errors on its construct.

try it with

select * from dbo.string_split_kvp('this has 2 repeated has', ' ');

Alan Burstein SSC Guru Points: 61141 More actions · Answer 8

apart from the performance that is likely to be quite bad has errors on its construct

The Construct is a problem but I suspect the performance should be fine, STRING_SPLIT is quite fast. The logic will force a sort but you will only sort up to 8001 rows which is quite low.

That said, the logic is broke. STRING_SPLIT is great when you don't need the items in a specific order. I use it for things like parsing multi-select strings in SSRS. If order is important then STRING_SPLIT is not the solution.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Steve Collins SSCrazy Points: 2176 More actions · Answer 9

Yes the logic is broke. I agree with all of Alan's comments. It seems possible to take corrective steps using cte(s) and additional windowing functions where the cardinality of the set(s) is the word count and not the character count. Maybe that doesn't get any performance gains though. I'll keep trying things. This issue of ordering split strings seems to come up over and over.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Sergiy SSC Guru Points: 110208 More actions · Answer 10

How about "This string is a space separated string"?

will this function work?

_____________
Code for TallyGenerator

Steve Collins SSCrazy Points: 2176 More actions · Answer 11

Sergiy wrote:

How about "This string is a space separated string"?
will this function work?

By "work" if you mean "give the right answer" then no. It doesn't return any error though. But I do have another idea to try. Once I'm back at home I'll test it out.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeff Moden SSC Guru Points: 1004421 More actions · Answer 12

@ Steve Collins,

Great try/good thinking. I'm thinking, though, that just about anything you add to it is going to slow it way down... especially since, if you read the article, the original performance problem was concatenation.

@ Alan,

I'm honestly a little bit surprised that you're justifying a sort because it has a low row count. We know how that usually works out.

@Everyone,

The function is this article served well for a long time but it has soundly been beaten for performance after 2012 came out by Eirikur Eiriksson. His good code still avoids concatenation and his change was super simple... Basically, he replaced CHAR index with LEAD. Prior to him posting his solution, he asked me to test it for him and I tested for accurate functionality and performance... It IS ACCURATE and it IS TWICE AS FAST as the original (getting real close to CLR speeds).

With that I'll say that if you're using 2012 or above, I strongly recommend you upgrade your function to his code, with due credit, of course.

Here's the link to Eirikur's article (the code is in Part 1 of his fine article).

https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2

If you want to performance test against something, you should performance test against Eirikur's code and I should have Steve Jones update this article to point to Eirikur's.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Alan Burstein SSC Guru Points: 61141 More actions · Answer 13

@ Alan,
I'm honestly a little bit surprised that you're justifying a sort because it has a low row count. We know how that usually works out.

I was a little rushed in my response. I'll clarify when I get time

This reply was modified 4 years ago by Alan Burstein.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

tellery_frmt SSC Rookie Points: 42 More actions · Answer 14

Hi Jeff.

your tally table approach to parsing is the best i've seen. i have a real world problem; wondering if you have any thoughts.

Sean Lange SSC Guru Points: 286590 More actions · Answer 15

tellery_frmt wrote:

Hi Jeff.
your tally table approach to parsing is the best i've seen. i have a real world problem; wondering if you have any thoughts.
i need to parse a large clob/text field using VARIABLE delimiters! each segment of the clob is delimited by one of the following 'tags': |system| ,|user|, |note|,|manager|,|contact| . the end of the segment is then delimited by the NEXT occurrence of one of the delimiters .
for example: |system| bunch of system data...|user| some data about users xyz.....|contact| more data for the contact...

Jeff's delimiter is one of the best tools for doing what it designed for. However, Jeff's delimiter is 1000000% the wrong tool for this job. It sounds like you would have far better luck using either XML or JSON, likely utilizing CLR. Posting some consumable sample data would make this a lot easier.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Tally OH! An Improved SQL 8K “CSV Splitter” Function

Cookies on SQLServerCentral