March 17, 2018 at 11:21 am
Comments posted to this topic are about the item Set-Based String Splitting table function
March 19, 2018 at 3:05 am
http://www.sqlservercentral.com/articles/Tally+Table/72993/ written by Jeff Moden (and others).
I would imaging that storing the results into a table variable then querying that for the output will cost some time, is it really necessary?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 19, 2018 at 3:28 am
bevanward - Saturday, March 17, 2018 11:21 AMComments posted to this topic are about the item Set-Based String Splitting table function
You can't use the function with very large string because of the tally table with only 1000 numbers.
insert into dbo.tNUM (NUM)
SELECT TOP (1000) n FROM Nums ORDER BY n;
March 19, 2018 at 4:19 am
Carlo Romagnano - Monday, March 19, 2018 3:28 AMbevanward - Saturday, March 17, 2018 11:21 AMComments posted to this topic are about the item Set-Based String Splitting table functionYou can't use the function with very large string because of the tally table with only 1000 numbers.
insert into dbo.tNUM (NUM)
SELECT TOP (1000) n FROM Nums ORDER BY n;
Hi Carlo
I have a few million in my table however did not want to add so much for the example. Just increase the number of records or update the function to use an existing if you have one?
Thanks
Bevan
March 19, 2018 at 4:21 am
ChrisM@Work - Monday, March 19, 2018 3:05 AMThanks for taking the time to write this up. Interesting - you've arrived at a few of the shortcuts of the house splitterhttp://www.sqlservercentral.com/articles/Tally+Table/72993/ written by Jeff Moden (and others).
I would imaging that storing the results into a table variable then querying that for the output will cost some time, is it really necessary?
Hi Chris
I had not seen this before so will have to have a look - from a brief look it seems to be very comprehensive.
I had not tried without the table variable so will have to experiment.
Thanks for the link!
Cheers
Bevan
March 19, 2018 at 6:35 am
ChrisM@Work - Monday, March 19, 2018 3:05 AMThanks for taking the time to write this up. Interesting - you've arrived at a few of the shortcuts of the house splitterhttp://www.sqlservercentral.com/articles/Tally+Table/72993/ written by Jeff Moden (and others).
I would imaging that storing the results into a table variable then querying that for the output will cost some time, is it really necessary?
Yes, this is the gold standard for string splitting in pure SQL. Surprised the author was not aware of it
March 19, 2018 at 10:09 am
gbritton1 - Monday, March 19, 2018 6:35 AMChrisM@Work - Monday, March 19, 2018 3:05 AMThanks for taking the time to write this up. Interesting - you've arrived at a few of the shortcuts of the house splitterhttp://www.sqlservercentral.com/articles/Tally+Table/72993/ written by Jeff Moden (and others).
I would imaging that storing the results into a table variable then querying that for the output will cost some time, is it really necessary?Yes, this is the gold standard for string splitting in pure SQL. Surprised the author was not aware of it
Hi gbritton1
Have just read through Jeff's work and it does seem to be VERY complete and I'm sure anything I've done is a subset of this!
I have been using my function for splitting million character files based on various patterns however need to compare to what Jeff has done as I'm sure it will improve performance. In the meantime was a useful learning exercise but certainly doesn't extend what is out there!
Thanks for the comment
Bevan
March 19, 2018 at 2:21 pm
bevanward - Monday, March 19, 2018 10:09 AMgbritton1 - Monday, March 19, 2018 6:35 AMChrisM@Work - Monday, March 19, 2018 3:05 AMThanks for taking the time to write this up. Interesting - you've arrived at a few of the shortcuts of the house splitterhttp://www.sqlservercentral.com/articles/Tally+Table/72993/ written by Jeff Moden (and others).
I would imaging that storing the results into a table variable then querying that for the output will cost some time, is it really necessary?Yes, this is the gold standard for string splitting in pure SQL. Surprised the author was not aware of it
Hi gbritton1
Have just read through Jeff's work and it does seem to be VERY complete and I'm sure anything I've done is a subset of this!
I have been using my function for splitting million character files based on various patterns however need to compare to what Jeff has done as I'm sure it will improve performance. In the meantime was a useful learning exercise but certainly doesn't extend what is out there!Thanks for the comment
Bevan
Just for the sake of curiosity, how many "fields" per "line",how many "lines", and how many average characters per line? I'm always up for different ideas, especially on this subject, and would like to knock off a test or two.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 19, 2018 at 3:30 pm
Jeff Moden - Monday, March 19, 2018 2:21 PMbevanward - Monday, March 19, 2018 10:09 AMgbritton1 - Monday, March 19, 2018 6:35 AMChrisM@Work - Monday, March 19, 2018 3:05 AMThanks for taking the time to write this up. Interesting - you've arrived at a few of the shortcuts of the house splitterhttp://www.sqlservercentral.com/articles/Tally+Table/72993/ written by Jeff Moden (and others).
I would imaging that storing the results into a table variable then querying that for the output will cost some time, is it really necessary?Yes, this is the gold standard for string splitting in pure SQL. Surprised the author was not aware of it
Hi gbritton1
Have just read through Jeff's work and it does seem to be VERY complete and I'm sure anything I've done is a subset of this!
I have been using my function for splitting million character files based on various patterns however need to compare to what Jeff has done as I'm sure it will improve performance. In the meantime was a useful learning exercise but certainly doesn't extend what is out there!Thanks for the comment
BevanJust for the sake of curiosity, how many "fields" per "line",how many "lines", and how many average characters per line? I'm always up for different ideas, especially on this subject, and would like to knock off a test or two.
Hi Jeff
To be open I'm sure I should be doing a lot of these tasks in Python but first tend to try and use SQL as a challenge.
I would be interested in any of your comments however can see your method is well tested!
I don't have a normal as such - I've started using it in lots of places.
For example finding the last sentence after a .(space), !(space), or ?(space) of 10 thousand records, maybe 10 sentences, less than 2000 characters each line.
One other example at the moment is to remove Google stop words for example
use tempdb
go
drop table if exists dbo.tStop
drop function if exists dbo.fnt_remove_stopwords
create table dbo.tStop (Word nvarchar(50) unique)
insert into dbo.tStop values ('a'),('about'),('above'),('after'),('again'),('against'),('all'),('am'),('an'),('and'),('any'),('are'),('as'),('at'),('be'),('because'),('been'),('before'),('being'),('below'),('between'),('both'),('but'),('by'),('could'),('did'),('do'),('does'),('doing'),('down'),('during'),('each'),('few'),('for'),('from'),('further'),('had'),('has'),('have'),('having'),('he'),('he''d'),('he''ll'),('he''s'),('her'),('here'),('here''s'),('hers'),('herself'),('him'),('himself'),('his'),('how'),('how''s'),('i'),('i''d'),('i''ll'),('i''m'),('i''ve'),('if'),('in'),('into'),('is'),('it'),('it''s'),('its'),('itself'),('let''s'),('me'),('more'),('most'),('my'),('myself'),('nor'),('of'),('on'),('once'),('only'),('or'),('other'),('ought'),('our'),('ours'),('ourselves'),('out'),('over'),('own'),('same'),('she'),('she''d'),('she''ll'),('she''s'),('should'),('so'),('some'),('such'),('than'),('that'),('that''s'),('the'),('their'),('theirs'),('them'),('themselves'),('then'),('there'),('there''s'),('these'),('they'),('they''d'),('they''ll'),('they''re'),('they''ve'),('this'),('those'),('through'),('to'),('too'),('under'),('until'),('up'),('very'),('was'),('we'),('we''d'),('we''ll'),('we''re'),('we''ve'),('were'),('what'),('what''s'),('when'),('when''s'),('where'),('where''s'),('which'),('while'),('who'),('who''s'),('whom'),('why'),('why''s'),('with'),('would'),('you'),('you''d'),('you''ll'),('you''re'),('you''ve'),('your'),('yours'),('yourself'),('yourselves')
go
create function dbo.fnt_remove_stopwords (@Value nvarchar(max))
returns nvarchar(max)
as
begin
declare @Out nvarchar(max)
set @Value = replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(@Value, '&',''),',',''),';',''),'.',''),'(',''),')',''),'?',''),':',''),'[',''),']',''),'!','')
select @Out = (select ' '+String from
(select top 100 percent Ordinal, String
from dbo.ufnt_split(@Value, ' ')
where String not in
(select Word
from tStop)
order by Ordinal) subqry
for xml path,TYPE).value('.[1]','nvarchar(max)')
return @out
end
go
select dbo.fnt_remove_stopwords('The shortbeard codling can grow to a maximum standard length of about 33 cm (13 in) and takes the form of a somewhat spindle-shaped cylinder. The snout is fairly blunt and the upper jaw is slightly longer than the lower jaw. There is a short barbel on the chin and a flattened spine on the operculum. There are two dorsal fins with no spines and a total of from 69 to 76 soft rays. The first dorsal fin has 6 rays, the second of which is the longest, and the second dorsal fin originates immediately behind the first fin. Each pectoral fin has 17 to 23 soft rays. The pelvic fins are widely separated and each has two elongated soft rays; these fins originate to the front of the pectoral fins. The anal fin has no spines and has 57 to 63 soft rays. The caudal fin is small and rounded. The general colour of this fish is tan, the dorsal and anal fins having dark edges.')
Thanks for taking the time to comment
Bevan
March 19, 2018 at 7:06 pm
Cool feedback. Thanks, Bevan.
I think the concern of some of the other poster's is having to do with performance. Generally speaking and with few exceptions, an mTVF (Multi-statement Table Valued Function) is going to be slower than an iTVF (Inline Table Valued Function). Your use of LEAD may help overcome that but, if you can use lead, take a look at a modernized version of DelimitedSplit8K that a good online friend, Eirikur Eiriksson , wrote. I've test it and it's twice as fast as DelimitedSplit8k and approaches CLR speeds very closely. Of course, it requires 2012+.
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
--Jeff Moden
Change is inevitable... Change for the better is not.
March 20, 2018 at 12:56 am
gbritton1 - Monday, March 19, 2018 6:35 AMChrisM@Work - Monday, March 19, 2018 3:05 AMThanks for taking the time to write this up. Interesting - you've arrived at a few of the shortcuts of the house splitterhttp://www.sqlservercentral.com/articles/Tally+Table/72993/ written by Jeff Moden (and others).
I would imaging that storing the results into a table variable then querying that for the output will cost some time, is it really necessary?Yes, this is the gold standard for string splitting in pure SQL. Surprised the author was not aware of it
While I've seen (and am using) Jeff's approach noted above (thanks Jeff!), I'm not going to ding @bevanward for not being aware of this previous article.
I say kudos to bevan for posting an approach that, while perhaps not as comprehensive as Jeff's, was clearly arrived at independently of Jeff's work. And just taking the time to share some ideas is a cool thing.
March 20, 2018 at 1:50 am
Jeff Moden - Monday, March 19, 2018 7:06 PMCool feedback. Thanks, Bevan.I think the concern of some of the other poster's is having to do with performance. Generally speaking and with few exceptions, an mTVF (Multi-statement Table Valued Function) is going to be slower than an iTVF (Inline Table Valued Function). Your use of LEAD may help overcome that but, if you can use lead, take a look at a modernized version of DelimitedSplit8K that a good online friend, Eirikur Eiriksson , wrote. I've test it and it's twice as fast as DelimitedSplit8k and approaches CLR speeds very closely. Of course, it requires 2012+.
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
Hi Jeff
Thanks for this - I had never thought through, but makes complete sense the multi-statement versus the single statement return. Since I have been using this more performance is going to start becoming an issue so will have a look at Eirikur's version once I've got my head around all of your work! Thanks for this is highly valued feedback!!
Take care
Bevan
March 20, 2018 at 1:55 am
Scott In Sydney - Tuesday, March 20, 2018 12:56 AMgbritton1 - Monday, March 19, 2018 6:35 AMChrisM@Work - Monday, March 19, 2018 3:05 AMThanks for taking the time to write this up. Interesting - you've arrived at a few of the shortcuts of the house splitterhttp://www.sqlservercentral.com/articles/Tally+Table/72993/ written by Jeff Moden (and others).
I would imaging that storing the results into a table variable then querying that for the output will cost some time, is it really necessary?Yes, this is the gold standard for string splitting in pure SQL. Surprised the author was not aware of it
While I've seen (and am using) Jeff's approach noted above (thanks Jeff!), I'm not going to ding @bevanward for not being aware of this previous article.
I say kudos to bevan for posting an approach that, while perhaps not as comprehensive as Jeff's, was clearly arrived at independently of Jeff's work. And just taking the time to share some ideas is a cool thing.
Hi Scott
Thanks for your message! It is hard to know how long to look for something before starting to write. What Jeff wrote looks perfect.
Had you seen the modernised version that Jeff mentions above? Seems to be worth considering from the looks of the performance.
Thanks again keep well
Bevan
March 20, 2018 at 3:04 pm
Jeff Moden - Monday, March 19, 2018 7:06 PMCool feedback. Thanks, Bevan.I think the concern of some of the other poster's is having to do with performance. Generally speaking and with few exceptions, an mTVF (Multi-statement Table Valued Function) is going to be slower than an iTVF (Inline Table Valued Function). Your use of LEAD may help overcome that but, if you can use lead, take a look at a modernized version of DelimitedSplit8K that a good online friend, Eirikur Eiriksson , wrote. I've test it and it's twice as fast as DelimitedSplit8k and approaches CLR speeds very closely. Of course, it requires 2012+.
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
Hi Jeff
Have modified Eirikur's version for multiple character delimiters and longer fields. Certainly runs fast from what I've seen so far have to test more.
The only issue I have stumbled into so far is where delimiters overlap and it splits multiple times - for example when splitting with three dots and try and pass through a area with lots of dots. Any thoughts appreciated. Thanks for pointing this all out.
Thanks
Bevan
March 20, 2018 at 3:10 pm
bevanward - Tuesday, March 20, 2018 3:04 PMJeff Moden - Monday, March 19, 2018 7:06 PMCool feedback. Thanks, Bevan.I think the concern of some of the other poster's is having to do with performance. Generally speaking and with few exceptions, an mTVF (Multi-statement Table Valued Function) is going to be slower than an iTVF (Inline Table Valued Function). Your use of LEAD may help overcome that but, if you can use lead, take a look at a modernized version of DelimitedSplit8K that a good online friend, Eirikur Eiriksson , wrote. I've test it and it's twice as fast as DelimitedSplit8k and approaches CLR speeds very closely. Of course, it requires 2012+.
http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
Hi Jeff
Have modified Eirikur's version for multiple character delimiters and longer fields. Certainly runs fast from what I've seen so far have to test more.
The only issue I have stumbled into so far is where delimiters overlap and it splits multiple times - for example when splitting with three dots and try and pass through a area with lots of dots. Any thoughts appreciated. Thanks for pointing this all out.
Thanks
Bevan
Post the modifications you made.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply