strings, substring, patterns (or lack of them)

  • Hello,

    I have this really nasty string that needs to be broken up. I've created some monstrum so far with left, len, patindex and substring and it takes care of exactly 2 first parts of the string. So tiring... :doze:

    Is there anyway to use e.g. left() that would search for 2nd, 3rd, etc occurence of the deliminator? there can be variable length of characters between deliminators... I've never worked with strings on such scale before. :unsure:

  • i've got a couple of suggestions for you;

    here's a function CHARINDEX2, which allows you to find the nth occurrance of a string within-a-string:

    /*

    Example:

    SELECT dbo.CHARINDEX2('a', 'abbabba', 3)

    returns the location of the third occurrence of 'a'

    which is 7

    */

    CREATE FUNCTION CHARINDEX2(

    @TargetStr varchar(8000),

    @SearchedStr varchar(8000),

    @Occurrence int)

    RETURNS int

    AS

    BEGIN

    DECLARE @pos int, @counter int, @ret int

    SET @pos = CHARINDEX(@TargetStr, @SearchedStr)

    SET @counter = 1

    IF @Occurrence = 1

    SET @ret = @pos

    ELSE

    BEGIN

    WHILE (@counter < @Occurrence)

    BEGIN

    SELECT @ret = CHARINDEX(@TargetStr, @SearchedStr, @pos + 1)

    SET @counter = @counter + 1

    SET @pos = @ret

    END

    END

    RETURN(@ret)

    END

    GO

    another option is to Split() the string into rows, and work witht eh data that way, here is one of the better versions of a split function out there:

    --usage: SELECT * FROM dbo.DelimitedSplit8K('alpha,beta,gamma,,more',',')

    CREATE FUNCTION dbo.DelimitedSplit8K

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT 0 UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1

    FROM cteTally t

    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))

    FROM cteStart s

    ;

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks so much. I'll probably go with the split idea. I think this is going to be fun incorporating this into one query (can't create functions). πŸ™‚

  • happiestgirl (9/16/2011)


    Thanks so much. I'll probably go with the split idea. I think this is going to be fun incorporating this into one query (can't create functions). πŸ™‚

    be sneaky and create it in tempdb database itself, then you can still use it:

    USE TEMPDB

    CREATE FUNCTION DelimitedSplit8K

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT 0 UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1

    FROM cteTally t

    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))

    FROM cteStart s

    ;

    GO

    SELECT * FROM tempdb..DelimitedSplit8K('alpha,beta,gamma,,more',',')

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Oh wow :hehe: I'll play with this one in my spare time :w00t:

    It's puzzling to me why SQL Server doesn't have a builtin function to take care of splits....

  • Lowell (9/16/2011)


    be sneaky and create it in tempdb database itself, then you can still use it:

    Wont objects in tempdb get dropped when the server restarts ?

    < i dint read thro the entire thread, so it may just be a dumb question>

  • ColdCoffee you are right of course, but at least the function would be available for the duration of the migration/parsing she was doing;

    i kind of assumed she was splitting data to stick into another table as a one time load, but the original poster would know better if my best guess will help;

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Ok, I just found the parsename function. It's very close to what I'm looking for, but you know, if I have variable number of delimiters per string and there is more than 4, that will cause a problem for me. The number of delimiters is anything between 0 and 6, so I may have up to 7 columns in my result...

    I could ignore (using replace) the first two parts of my string xxx.xxx. - but that leaves me with one extra part that will give me an error, and also everything gets shifted to either left of right, and I need to keep my tree order, if you know what I mean.

    I'm not expecting code from anyone, i want to learn, but anyone has an idea what built-in function i could use here?

    Lowell (9/17/2011)


    ColdCoffee you are right of course, but at least the function would be available for the duration of the migration/parsing she was doing;

    i kind of assumed she was splitting data to stick into another table as a one time load, but the original poster would know better if my best guess will help;

    It's one time thing, I need to keep the tree structure as such

    xx.yyy.zzz. www.

    xx.yyy

    xx.yyy.zzz.www.ddd

    ttt.hh.yyy.zzz etc

    each period here closes a column, so you see it's a variable length.

  • ok here's an enhanced version of a parsename function, which takes any number of items, and an optional delimiter to boot:

    i think this is from a Jeff Moden post ....

    --usage:

    with MySAmpleData

    AS

    (

    SELECT 'xx.yyy.zzz. www.' AS MyColumn UNION ALL

    SELECT 'xx.yyy' UNION ALL

    SELECT 'xx.yyy.zzz.www.ddd' UNION ALL

    SELECT 'ttt.hh.yyy.zzz.etc'

    )

    select

    dbo.fn_parsename(MyColumn,'.',1),

    dbo.fn_parsename(MyColumn,'.',2),

    dbo.fn_parsename(MyColumn,'.',3),

    dbo.fn_parsename(MyColumn,'.',4),

    dbo.fn_parsename(MyColumn,'.',5),

    dbo.fn_parsename(MyColumn,'.',6),

    dbo.fn_parsename(MyColumn,'.',7),

    MyColumn

    FROM MySAmpleData

    the function:

    CREATE FUNCTION dbo.fn_parsename

    (

    @pString VARCHAR(7999),

    @pDelimiter CHAR(1),

    @Occurrance int

    )

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @Results VARCHAR(8000)

    --===== "Inline" CTE Driven "Tally Table” produces values up to

    -- 10,000... enough to cover VARCHAR(8000)

    ;WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4) ,

    --===== Do the split

    InterResults

    AS

    (

    SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

    SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item

    FROM cteTally

    WHERE N < LEN(@pString) + 2

    AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

    )

    SELECT @Results = Item FROM InterResults WHERE ItemNumber = @Occurrance

    return @Results

    END --FUNCTION

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I see no way of doing it without an udf, that's OK I'll ask for permission, thank you so much for tour help πŸ™‚

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply