November 5, 2014 at 7:26 am
I've been using Jeff Moden's excellent Tally OH splitter for a while
http://www.sqlservercentral.com/articles/Tally+Table/72993/
In the comments it says:
Optimized for single character delimiter. Multi-character delimiters should be resolved externally from this
function.
However, I know I ran across a similarly optimized, inline function that supports multi-character delimiters. I just can't find it!
Can someone please reply with the link?
Gerald Britton, Pluralsight courses
November 5, 2014 at 7:46 am
Found it here:
http://www.sqlservercentral.com/articles/String+Manipulation/94365/
Silly me, I was searching for "multi-character delimiters" Had I searched for "patterns" I would have hit it right away!
Gerald Britton, Pluralsight courses
November 5, 2014 at 9:53 am
Note that those functions aren't intended for multi-character delimiters.
November 5, 2014 at 10:32 am
Luis Cazares (11/5/2014)
Note that those functions aren't intended for multi-character delimiters.
Well....a multi-character delimiter is just a simple pattern without wildcards. So even though that's not the design goal, it is usable for that goal I believe.
Gerald Britton, Pluralsight courses
November 5, 2014 at 12:23 pm
g.britton (11/5/2014)
I've been using Jeff Moden's excellent Tally OH splitter for a whilehttp://www.sqlservercentral.com/articles/Tally+Table/72993/
In the comments it says:
Optimized for single character delimiter. Multi-character delimiters should be resolved externally from this
function.
However, I know I ran across a similarly optimized, inline function that supports multi-character delimiters. I just can't find it!
Can someone please reply with the link?
I don't have a link but perhaps this will help... I re-factored delimitedSplit8k some time back for this purpose. I call it LiteralSplit8K. Please be warned - this is a hacked version of Jeff Moden's splitter and has not been tested as thoroughly by anyone on this forum.
ALTER FUNCTION [dbo].[LiteralSplit8K](@pString VARCHAR(8000), @pDelimiter VARCHAR(50))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+(len(@pDelimiter)) FROM cteTally t WHERE SUBSTRING(@pString,t.N,(len(@pDelimiter))) = @pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)
--ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l;
/*
Example:
SELECT *
FROM dbo.LiteralSplit8K('xxx<br/>rrr<br/>ttt','<br/>')
*/
GO
-- Itzik Ben-Gan 2001
November 5, 2014 at 2:18 pm
emph. mine
g.britton (11/5/2014)
Luis Cazares (11/5/2014)
Note that those functions aren't intended for multi-character delimiters.Well....a multi-character delimiter is just a simple pattern without wildcards. So even though that's not the design goal, it is usable for that goal I believe.
That is true. I think Luis' point was that you might not get the performance that you want. If you are not dealing with a pattern you could do something like this:
SELECT item
FROM sqltoolbox.dbo.DelimitedSplit8K
(
replace('blah1 \r\ blah2 \r\ blah3','\r\',char(2)),
char(2)
);
... which will get you the same results as this but with better performance:
SELECT item
FROM sqltoolbox.dbo.PatternSplitCM('blah1 \r\ blah2 \r\ blah3','%[\r\]%')
WHERE [Matched] = 0
Both DelimitedSplit8K and PatternSplitCM are the best at what they were intended for but they were not intended for the same thing.
Edit: minor typo in my PatternSplitCM code.
-- Itzik Ben-Gan 2001
November 6, 2014 at 7:56 am
Alan.B (11/5/2014)
emph. mineg.britton (11/5/2014)
Luis Cazares (11/5/2014)
Note that those functions aren't intended for multi-character delimiters.Well....a multi-character delimiter is just a simple pattern without wildcards. So even though that's not the design goal, it is usable for that goal I believe.
That is true. I think Luis' point was that you might not get the performance that you want. If you are not dealing with a pattern you could do something like this:
Actually I don't think that was Luis' point at all. He merely pointed out that the pattern splitter expects a pattern in the form of a PATINDEX or LIKE pattern. He said nothing about performance.
In my case, I want to split a string based on a series of recurring characters, e.g.
blah1[series of recurring characters]blah2[series of recurring characters]blah3[series of recurring characters]
I proposed leveraging the pattern splitter function to do that, specifying '%[series of recurring characters]%' as the pattern
Gerald Britton, Pluralsight courses
November 6, 2014 at 8:24 am
You're right, it wasn't about performance. It's because the splitter works on a single char basis. Modifying Alan's example could give us undesired results.
SELECT *
FROM dbo.PatternSplitCM('blrah1 \r\ bl\ah2 \r\ blah3','%[\r\]%')
WHERE [Matched] = 0
Results:
ItemNumber Item Matched
------------ ----------------- -----------
1 bl 0
3 ah1 0
5 bl 0
7 ah2 0
9 blah3 0
November 6, 2014 at 8:32 am
yes, of course, but with those caveats it should still be useful
Edit: I'm wrong. It can't be used in the way I imagined.
e.g. try to find hrefs in some html:
SELECT *
FROM dbo.PatternSplitCM('blrah1 <a href="foo"> bl\ah2</a> <a href="bar"> blah3 </a>','%<a%>%')
WHERE [Matched] = 0
returns
ItemNumberItemMatched
1 blrah1 <a href="foo"> bl\ah2</a> <a href="bar"> blah3 </a>0
So I'm going back to Alan's modified splitter.
Gerald Britton, Pluralsight courses
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply