strings, substring, patterns (or lack of them)

Question

Post reply

strings, substring, patterns (or lack of them)

happiestgirl

SSC-Addicted

Points: 440
More actions
September 16, 2011 at 2:03 pm

#257541

Hello,
I have this really nasty string that needs to be broken up. I've created some monstrum so far with left, len, patindex and substring and it takes care of exactly 2 first parts of the string. So tiring... :doze:
Is there anyway to use e.g. left() that would search for 2nd, 3rd, etc occurence of the deliminator? there can be variable length of characters between deliminators... I've never worked with strings on such scale before. :unsure:

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323518 More actions · Answer 1

i've got a couple of suggestions for you;

here's a function CHARINDEX2, which allows you to find the nth occurrance of a string within-a-string:

/*

Example:

SELECT dbo.CHARINDEX2('a', 'abbabba', 3)

returns the location of the third occurrence of 'a'

which is 7

*/

CREATE FUNCTION CHARINDEX2(

@TargetStr varchar(8000),

@SearchedStr varchar(8000),

@Occurrence int)

RETURNS int

AS

BEGIN

DECLARE @pos int, @counter int, @ret int

SET @pos = CHARINDEX(@TargetStr, @SearchedStr)

SET @counter = 1

IF @Occurrence = 1

SET @ret = @pos

ELSE

BEGIN

WHILE (@counter < @Occurrence)

BEGIN

SELECT @ret = CHARINDEX(@TargetStr, @SearchedStr, @pos + 1)

SET @counter = @counter + 1

SET @pos = @ret

END

RETURN(@ret)

END

GO

another option is to Split() the string into rows, and work witht eh data that way, here is one of the better versions of a split function out there:

--usage: SELECT * FROM dbo.DelimitedSplit8K('alpha,beta,gamma,,more',',')

CREATE FUNCTION dbo.DelimitedSplit8K

--===== Define I/O parameters

(@pString VARCHAR(8000), @pDelimiter CHAR(1))

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

-- enough to cover VARCHAR(8000)

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT 0 UNION ALL

SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

),

cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

SELECT t.N+1

FROM cteTally t

WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

)

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))

FROM cteStart s

;

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

happiestgirl SSC-Addicted Points: 440 More actions · Answer 2

Thanks so much. I'll probably go with the split idea. I think this is going to be fun incorporating this into one query (can't create functions). 🙂

Lowell SSC Guru Points: 323518 More actions · Answer 3

happiestgirl (9/16/2011)
Thanks so much. I'll probably go with the split idea. I think this is going to be fun incorporating this into one query (can't create functions). 🙂

be sneaky and create it in tempdb database itself, then you can still use it:

USE TEMPDB

CREATE FUNCTION DelimitedSplit8K

--===== Define I/O parameters

(@pString VARCHAR(8000), @pDelimiter CHAR(1))

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

-- enough to cover VARCHAR(8000)

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT 0 UNION ALL

SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

),

cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

SELECT t.N+1

FROM cteTally t

WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

)

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))

FROM cteStart s

;

GO

SELECT * FROM tempdb..DelimitedSplit8K('alpha,beta,gamma,,more',',')

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

happiestgirl SSC-Addicted Points: 440 More actions · Answer 4

Oh wow :hehe: I'll play with this one in my spare time :w00t:

It's puzzling to me why SQL Server doesn't have a builtin function to take care of splits....

ColdCoffee SSC-Dedicated Points: 39972 More actions · Answer 5

Lowell (9/16/2011)
be sneaky and create it in tempdb database itself, then you can still use it:

Wont objects in tempdb get dropped when the server restarts ?

< i dint read thro the entire thread, so it may just be a dumb question>

Lowell SSC Guru Points: 323518 More actions · Answer 6

ColdCoffee you are right of course, but at least the function would be available for the duration of the migration/parsing she was doing;

i kind of assumed she was splitting data to stick into another table as a one time load, but the original poster would know better if my best guess will help;

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

happiestgirl SSC-Addicted Points: 440 More actions · Answer 7

Ok, I just found the parsename function. It's very close to what I'm looking for, but you know, if I have variable number of delimiters per string and there is more than 4, that will cause a problem for me. The number of delimiters is anything between 0 and 6, so I may have up to 7 columns in my result...

I could ignore (using replace) the first two parts of my string xxx.xxx. - but that leaves me with one extra part that will give me an error, and also everything gets shifted to either left of right, and I need to keep my tree order, if you know what I mean.

I'm not expecting code from anyone, i want to learn, but anyone has an idea what built-in function i could use here?

Lowell (9/17/2011)
ColdCoffee you are right of course, but at least the function would be available for the duration of the migration/parsing she was doing;
i kind of assumed she was splitting data to stick into another table as a one time load, but the original poster would know better if my best guess will help;

It's one time thing, I need to keep the tree structure as such

xx.yyy.zzz. www.

xx.yyy

xx.yyy.zzz.www.ddd

ttt.hh.yyy.zzz etc

each period here closes a column, so you see it's a variable length.

Lowell SSC Guru Points: 323518 More actions · Answer 8

ok here's an enhanced version of a parsename function, which takes any number of items, and an optional delimiter to boot:

i think this is from a Jeff Moden post ....

--usage:

with MySAmpleData

AS

(

SELECT 'xx.yyy.zzz. www.' AS MyColumn UNION ALL

SELECT 'xx.yyy' UNION ALL

SELECT 'xx.yyy.zzz.www.ddd' UNION ALL

SELECT 'ttt.hh.yyy.zzz.etc'

)

select

dbo.fn_parsename(MyColumn,'.',1),

dbo.fn_parsename(MyColumn,'.',2),

dbo.fn_parsename(MyColumn,'.',3),

dbo.fn_parsename(MyColumn,'.',4),

dbo.fn_parsename(MyColumn,'.',5),

dbo.fn_parsename(MyColumn,'.',6),

dbo.fn_parsename(MyColumn,'.',7),

MyColumn

FROM MySAmpleData

the function:

CREATE FUNCTION dbo.fn_parsename

(

@pString VARCHAR(7999),

@pDelimiter CHAR(1),

@Occurrance int

)

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @Results VARCHAR(8000)

--===== "Inline" CTE Driven "Tally Table” produces values up to

-- 10,000... enough to cover VARCHAR(8000)

;WITH

E1(N) AS ( --=== Create Ten 1's

SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 --10

),

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4) ,

--===== Do the split

InterResults

AS

(

SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item

FROM cteTally

WHERE N < LEN(@pString) + 2

AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

)

SELECT @Results = Item FROM InterResults WHERE ItemNumber = @Occurrance

return @Results

END --FUNCTION

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

happiestgirl SSC-Addicted Points: 440 More actions · Answer 9

I see no way of doing it without an udf, that's OK I'll ask for permission, thank you so much for tour help 🙂