March 19, 2019 at 7:26 am
Eirikur Eiriksson - Tuesday, March 19, 2019 7:18 AMQuick question, are there any multi-line entries or can one use the new line bytes as a row delimiter?
To whom are you directing that question?
--Jeff Moden
Change is inevitable... Change for the better is not.
March 19, 2019 at 8:52 am
Jeff Moden - Tuesday, March 19, 2019 7:26 AMEirikur Eiriksson - Tuesday, March 19, 2019 7:18 AMQuick question, are there any multi-line entries or can one use the new line bytes as a row delimiter?To whom are you directing that question?
The OP
The spanner in the works is the multi row entries, big difference from handling single row entries
March 20, 2019 at 5:33 am
Eirikur Eiriksson - Tuesday, March 19, 2019 8:52 AMJeff Moden - Tuesday, March 19, 2019 7:26 AMEirikur Eiriksson - Tuesday, March 19, 2019 7:18 AMQuick question, are there any multi-line entries or can one use the new line bytes as a row delimiter?To whom are you directing that question?
The OP
The spanner in the works is the multi row entries, big difference from handling single row entries
Agreed but not much of one if each "record" has the same pattern of multi-line entries. I do use format files for such a thing if the multi-line entries are consistent.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 8, 2020 at 12:54 am
Hey, Jeff, will you update this yet again to include the new split_string() function in SQL Server 2016+?
January 8, 2020 at 3:33 am
Hey, Jeff, will you update this yet again to include the new split_string() function in SQL Server 2016+?
Probably not. The STRING_SPLIT function is relatively crippled for those that need to know the position of the split out elements. That's been pretty well covered by Wayne Sheffield when it first came out. Here's the link...
https://www.sqlservercentral.com/articles/splitting-strings-in-sql-server-2016
--Jeff Moden
Change is inevitable... Change for the better is not.
December 25, 2020 at 5:23 pm
corey lawson wrote:Hey, Jeff, will you update this yet again to include the new split_string() function in SQL Server 2016+?
Probably not. The STRING_SPLIT function is relatively crippled for those that need to know the position of the split out elements. That's been pretty well covered by Wayne Sheffield when it first came out. Here's the link...
https://www.sqlservercentral.com/articles/splitting-strings-in-sql-server-2016
Long story short, here's a schemabound ordinal splitter tvf which uses the string_split function as its base and then uses charindex with the "double separator append" trick. Maybe this is an efficient approach?
drop function if exists dbo.string_split_kvp;
go
create function dbo.string_split_kvp(
@string varchar(8000),
@separator char(1))
returns table with schemabinding as return
select row_number() over (order by
charindex(concat(@separator, [value], @separator),
concat(@separator, @string, @separator))) ,
[value]
from string_split(@string, @separator);
Example
select * from dbo.string_split_kvp('this is a space separated string', ' ');
Output
keyvalue
1this
2is
3a
4space
5separated
6string
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
December 26, 2020 at 12:32 am
Jeff Moden wrote:corey lawson wrote:Hey, Jeff, will you update this yet again to include the new split_string() function in SQL Server 2016+?
Probably not. The STRING_SPLIT function is relatively crippled for those that need to know the position of the split out elements. That's been pretty well covered by Wayne Sheffield when it first came out. Here's the link...
https://www.sqlservercentral.com/articles/splitting-strings-in-sql-server-2016
Long story short, here's a schemabound ordinal splitter tvf which uses the string_split function as its base and then uses charindex with the "double separator append" trick. Maybe this is an efficient approach?
drop function if exists dbo.string_split_kvp;
go
create function dbo.string_split_kvp(
@string varchar(8000),
@separator char(1))
returns table with schemabinding as return
select row_number() over (order by
charindex(concat(@separator, [value], @separator),
concat(@separator, @string, @separator))) ,
[value]
from string_split(@string, @separator);Example
select * from dbo.string_split_kvp('this is a space separated string', ' ');Output
keyvalue
1this
2is
3a
4space
5separated
6string
apart from the performance that is likely to be quite bad it has errors on its construct.
try it with
select * from dbo.string_split_kvp('this has 2 repeated has', ' ');
December 26, 2020 at 1:21 am
apart from the performance that is likely to be quite bad has errors on its construct
The Construct is a problem but I suspect the performance should be fine, STRING_SPLIT is quite fast. The logic will force a sort but you will only sort up to 8001 rows which is quite low.
That said, the logic is broke. STRING_SPLIT is great when you don't need the items in a specific order. I use it for things like parsing multi-select strings in SSRS. If order is important then STRING_SPLIT is not the solution.
-- Itzik Ben-Gan 2001
December 26, 2020 at 2:47 am
Yes the logic is broke. I agree with all of Alan's comments. It seems possible to take corrective steps using cte(s) and additional windowing functions where the cardinality of the set(s) is the word count and not the character count. Maybe that doesn't get any performance gains though. I'll keep trying things. This issue of ordering split strings seems to come up over and over.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
December 26, 2020 at 3:17 am
How about "This string is a space separated string"?
will this function work?
_____________
Code for TallyGenerator
December 26, 2020 at 1:15 pm
How about "This string is a space separated string"?
will this function work?
By "work" if you mean "give the right answer" then no. It doesn't return any error though. But I do have another idea to try. Once I'm back at home I'll test it out.
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
December 27, 2020 at 5:18 pm
@ Steve Collins,
Great try/good thinking. I'm thinking, though, that just about anything you add to it is going to slow it way down... especially since, if you read the article, the original performance problem was concatenation.
@ Alan,
I'm honestly a little bit surprised that you're justifying a sort because it has a low row count. We know how that usually works out.
@Everyone,
The function is this article served well for a long time but it has soundly been beaten for performance after 2012 came out by Eirikur Eiriksson. His good code still avoids concatenation and his change was super simple... Basically, he replaced CHAR index with LEAD. Prior to him posting his solution, he asked me to test it for him and I tested for accurate functionality and performance... It IS ACCURATE and it IS TWICE AS FAST as the original (getting real close to CLR speeds).
With that I'll say that if you're using 2012 or above, I strongly recommend you upgrade your function to his code, with due credit, of course.
Here's the link to Eirikur's article (the code is in Part 1 of his fine article).
https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2
If you want to performance test against something, you should performance test against Eirikur's code and I should have Steve Jones update this article to point to Eirikur's.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 27, 2020 at 8:17 pm
@ Alan,
I'm honestly a little bit surprised that you're justifying a sort because it has a low row count. We know how that usually works out.
I was a little rushed in my response. I'll clarify when I get time
-- Itzik Ben-Gan 2001
August 10, 2021 at 9:02 pm
Hi Jeff.
your tally table approach to parsing is the best i've seen. i have a real world problem; wondering if you have any thoughts.
i need to parse a large clob/text field using VARIABLE delimiters! each segment of the clob is delimited by one of the following 'tags': |system| ,|user|, |note|,|manager|,|contact| . the end of the segment is then delimited by the NEXT occurrence of one of the delimiters .
for example: |system| bunch of system data...|user| some data about users xyz.....|contact| more data for the contact...
August 10, 2021 at 9:49 pm
Hi Jeff.
your tally table approach to parsing is the best i've seen. i have a real world problem; wondering if you have any thoughts.
i need to parse a large clob/text field using VARIABLE delimiters! each segment of the clob is delimited by one of the following 'tags': |system| ,|user|, |note|,|manager|,|contact| . the end of the segment is then delimited by the NEXT occurrence of one of the delimiters .
for example: |system| bunch of system data...|user| some data about users xyz.....|contact| more data for the contact...
Jeff's delimiter is one of the best tools for doing what it designed for. However, Jeff's delimiter is 1000000% the wrong tool for this job. It sounds like you would have far better luck using either XML or JSON, likely utilizing CLR. Posting some consumable sample data would make this a lot easier.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
Viewing 15 posts - 946 through 960 (of 990 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy