Append characters in a sentence after certain length

  • Hi,

    Need your help in accomplishing this one. We run on SQL Server 2008. I've a string that can be upto 200 characters max and after every 32 characters a pile (|) got to be inserted. if that 32 characters come in midword, then that pipe has to be placed in the beginning of that word, not in the middle of the word.

    Example:

    Original String: ABCDEFGH IJKLMNOPQRS TUVWXYZ ABCDEFGHIJKLMN OPQRSTUVWXYZA BCDEFGHIJKLMNOP

    Want this way: ABCDEFGH IJKLMNOPQRS TUVWXYZ |ABCDEFGHIJKLMN OPQRSTUVWXYZA |BCDEFGHIJKLMNOP

    since these strings are in a table.column and this need to operate on the entire data set (about 200k records), performance is also important.

    thanks in advance..

  • So are you talking about updating the table once-off, or whenever a client app retrieves the data? Plus what if the previous "word" is less than 32 characters long and a pipe has been inserted, what do you do?

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Thanks for your response. Its kind of when the client retrieves the data.

    again, the 32 character count should start from where the pipe was placed.

  • I'd strongly suggest you do this on the client side. SQL isn't geared towards manipulating strings.

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Venkata-433090 (6/19/2013)


    Thanks for your response. Its kind of when the client retrieves the data.

    again, the 32 character count should start from where the pipe was placed.

    Shouldn't take too much effort to tailor this to your requirements:

    -- sample data

    ;WITH SampleData AS (

    SELECT

    MyString = RTRIM(STUFF(STUFF(MyString,32,0,'|'),64,0,'|'))

    FROM (

    SELECT MyString = CAST('ABCDEFGH IJKLMNOPQRS TUVWXYZ ABCDEFGHIJKLMN OPQRSTUVWXYZA BCDEFGHIJKLMNOP' AS VARCHAR(200))

    ) d

    )

    -- solution

    SELECT

    MyString,

    NewString =

    STUFF(

    STUFF(

    STUFF(

    STUFF(

    STUFF(

    STUFF(

    CleanString

    ,Pos1,0,CASE WHEN Pos1 > 1 THEN '|' ELSE '' END)

    ,Pos2,0,CASE WHEN Pos2 > 1 THEN '|' ELSE '' END)

    ,Pos3,0,CASE WHEN Pos3 > 1 THEN '|' ELSE '' END)

    ,Pos4,0,CASE WHEN Pos4 > 1 THEN '|' ELSE '' END)

    ,Pos5,0,CASE WHEN Pos5 > 1 THEN '|' ELSE '' END)

    ,Pos6,0,CASE WHEN Pos6 > 1 THEN '|' ELSE '' END)

    FROM SampleData

    CROSS APPLY (SELECT CleanString = REPLACE(MyString,'|','')) y

    CROSS APPLY ( -- get the position of the last ' ' in each 32 character 'line'

    SELECT

    Pos1 = MAX(CASE WHEN line = 1 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos2 = MAX(CASE WHEN line = 2 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos3 = MAX(CASE WHEN line = 3 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos4 = MAX(CASE WHEN line = 4 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos5 = MAX(CASE WHEN line = 5 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos6 = MAX(CASE WHEN line = 6 THEN LastSpaceInLine+1 ELSE 1 END)

    FROM (

    SELECT

    x.line,

    LastSpaceInLine = MAX(n)

    FROM dbo.InlineTally (200) t

    CROSS APPLY (SELECT line = 1+n/32) x

    WHERE n <= LEN(MyString)

    AND SUBSTRING(MyString,t.n,1) = ' '

    GROUP BY x.line

    ) d

    ) x1

    -- Function dbo.InlineTally

    CREATE FUNCTION [dbo].[InlineTally]

    (@RowCount INT)

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    (

    WITH E1(N) AS ( -- 10 rows

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c), -- 10 x 10 x 10 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b) --1000 x 1000 rows max

    SELECT TOP (@RowCount) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    )

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Now why does this InlineTally function look so familiar? 😉

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Jan Van der Eecken (6/21/2013)


    Now why does this InlineTally function look so familiar? 😉

    Same reason as Feynman diagrams are now so familiar, Jan 😉

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I like the Penguin one. 🙂

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Thanks you all guys for your responses. I'll implement this and let you know..

  • Venkata-433090 (6/24/2013)


    Thanks you all guys for your responses. I'll implement this and let you know..

    My question would be, what do you want to do with "words" that are more than 32 characters. Heh... don't say it won't happen.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ChrisM@Work (6/20/2013)


    Venkata-433090 (6/19/2013)


    Thanks for your response. Its kind of when the client retrieves the data.

    again, the 32 character count should start from where the pipe was placed.

    Shouldn't take too much effort to tailor this to your requirements:

    -- sample data

    ;WITH SampleData AS (

    SELECT

    MyString = RTRIM(STUFF(STUFF(MyString,32,0,'|'),64,0,'|'))

    FROM (

    SELECT MyString = CAST('ABCDEFGH IJKLMNOPQRS TUVWXYZ ABCDEFGHIJKLMN OPQRSTUVWXYZA BCDEFGHIJKLMNOP' AS VARCHAR(200))

    ) d

    )

    -- solution

    SELECT

    MyString,

    NewString =

    STUFF(

    STUFF(

    STUFF(

    STUFF(

    STUFF(

    STUFF(

    CleanString

    ,Pos1,0,CASE WHEN Pos1 > 1 THEN '|' ELSE '' END)

    ,Pos2,0,CASE WHEN Pos2 > 1 THEN '|' ELSE '' END)

    ,Pos3,0,CASE WHEN Pos3 > 1 THEN '|' ELSE '' END)

    ,Pos4,0,CASE WHEN Pos4 > 1 THEN '|' ELSE '' END)

    ,Pos5,0,CASE WHEN Pos5 > 1 THEN '|' ELSE '' END)

    ,Pos6,0,CASE WHEN Pos6 > 1 THEN '|' ELSE '' END)

    FROM SampleData

    CROSS APPLY (SELECT CleanString = REPLACE(MyString,'|','')) y

    CROSS APPLY ( -- get the position of the last ' ' in each 32 character 'line'

    SELECT

    Pos1 = MAX(CASE WHEN line = 1 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos2 = MAX(CASE WHEN line = 2 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos3 = MAX(CASE WHEN line = 3 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos4 = MAX(CASE WHEN line = 4 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos5 = MAX(CASE WHEN line = 5 THEN LastSpaceInLine+1 ELSE 1 END),

    Pos6 = MAX(CASE WHEN line = 6 THEN LastSpaceInLine+1 ELSE 1 END)

    FROM (

    SELECT

    x.line,

    LastSpaceInLine = MAX(n)

    FROM dbo.InlineTally (200) t

    CROSS APPLY (SELECT line = 1+n/32) x

    WHERE n <= LEN(MyString)

    AND SUBSTRING(MyString,t.n,1) = ' '

    GROUP BY x.line

    ) d

    ) x1

    -- Function dbo.InlineTally

    CREATE FUNCTION [dbo].[InlineTally]

    (@RowCount INT)

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    (

    WITH E1(N) AS ( -- 10 rows

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c), -- 10 x 10 x 10 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b) --1000 x 1000 rows max

    SELECT TOP (@RowCount) n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    )

    Very cool but try it with the following replacement and see what happens with "zz".

    SELECT MyString = CAST('aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz' AS VARCHAR(200))

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    Create FUNCTION String_Insert ( @String VARCHAR(MAX) )

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @i INT= 0

    DECLARE @x INT= 0

    DECLARE @max-2 INT = LEN(LTRIM(RTRIM(@String)))

    DECLARE @REV VARCHAR(32)

    SET @i = 32;

    WHILE @i <= @max-2

    BEGIN

    IF SUBSTRING(@String, @i, 1) = ' '

    BEGIN

    SET @String = STUFF(@String, @i, 0, '|')

    END

    ELSE

    BEGIN

    SET @x = CHARINDEX(' ',

    REVERSE(SUBSTRING(@String, 1, @i)))

    SET @i = @i - @x + 2;

    SET @String = STUFF(@String, @i, 0, '|')

    END

    SET @i = @i + 32;

    END

    -- Return the result of the function

    RETURN @String

    END

    GO

    Here I created a temp table and tested on it.

    IF object_ID('tempDB..#temp') IS NOT NULL

    DROP TABLE #temp

    CREATE TABLE #temp

    (

    String VARCHAR(200)

    )

    INSERT #temp (String) VALUES ( 'ABCDEFGH IJKLMNOPQRS TUVWXYZ ABCDEFGHIJKLMN OPQRSTUVWXYZA BCDEFGHIJKLMNOP'

    ),('ABCDEFGH IJKLMNOPQRS TUVWXYZ ABCDEFGHIJKLMN OPQRSTUVWXYZAB CDEFGHIJKLMNOP')

    SELECT String_Insert(String) FROM #temp

    Thanks to my friend Lakshmi who helped me in writing the code.This is one way of doing it! Hope it works.

  • SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    Create FUNCTION String_Insert ( @String VARCHAR(MAX) )

    RETURNS VARCHAR(MAX)

    AS

    BEGIN

    DECLARE @i INT= 0

    DECLARE @x INT= 0

    DECLARE @max-2 INT = LEN(LTRIM(RTRIM(@String)))

    DECLARE @REV VARCHAR(32)

    SET @i = 32;

    WHILE @i <= @max-2

    BEGIN

    IF SUBSTRING(@String, @i, 1) = ' '

    BEGIN

    SET @String = STUFF(@String, @i, 0, '|')

    END

    ELSE

    BEGIN

    SET @x = CHARINDEX(' ',

    REVERSE(SUBSTRING(@String, 1, @i)))

    SET @i = @i - @x + 2;

    SET @String = STUFF(@String, @i, 0, '|')

    END

    SET @i = @i + 32;

    END

    -- Return the result of the function

    RETURN @String

    END

    GO

    Here I created a temp table and tested on it.

    IF object_ID('tempDB..#temp') IS NOT NULL

    DROP TABLE #temp

    CREATE TABLE #temp

    (

    String VARCHAR(200)

    )

    INSERT #temp (String) VALUES ( 'ABCDEFGH IJKLMNOPQRS TUVWXYZ ABCDEFGHIJKLMN OPQRSTUVWXYZA BCDEFGHIJKLMNOP'

    ),('ABCDEFGH IJKLMNOPQRS TUVWXYZ ABCDEFGHIJKLMN OPQRSTUVWXYZAB CDEFGHIJKLMNOP')

    SELECT String_Insert(String) FROM #temp

    This is one way of doing it! Hope it works.

  • Here is an iTVF for it...

    CREATE FUNCTION SplitWithDelimiter (@MyString VARCHAR(200), @MaxLength INT, @Delimiter CHAR(1), @Breakers varchar(256))

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    SELECT substring(base.value, 1, Pos1) + @Delimiter + substring(base.value, Pos1 + 1, Pos2 - Pos1) + @Delimiter + substring(base.value, Pos2 + 1, Pos3 - Pos2) + @Delimiter + substring(base.value, Pos3 + 1, Pos4 - Pos3) + @Delimiter + substring(base.value, Pos4 + 1, Pos5 - Pos4) + @Delimiter + substring(base.value, Pos5 + 1, Pos6 - Pos5) + @Delimiter + substring(base.value, Pos6 + 1, Pos7 - Pos6) AS result

    FROM (

    VALUES (@MyString)

    ) AS base(value)

    CROSS APPLY (

    SELECT ISNULL(NULLIF((@MaxLength + 1) - PATINDEX('%'+@Breakers+'%', REVERSE(SUBSTRING(base.value, 1, @MaxLength))), (@MaxLength + 1)), @MaxLength)

    ) a(Pos1)

    CROSS APPLY (

    SELECT ISNULL(NULLIF((@MaxLength + 1) - PATINDEX('%'+@Breakers+'%', REVERSE(SUBSTRING(base.value, Pos1 + 1, @MaxLength))), (@MaxLength + 1)), @MaxLength) + Pos1

    ) b(Pos2)

    CROSS APPLY (

    SELECT ISNULL(NULLIF((@MaxLength + 1) - PATINDEX('%'+@Breakers+'%', REVERSE(SUBSTRING(base.value, Pos2 + 1, @MaxLength))), (@MaxLength + 1)), @MaxLength) + Pos2

    ) c(Pos3)

    CROSS APPLY (

    SELECT ISNULL(NULLIF((@MaxLength + 1) - PATINDEX('%'+@Breakers+'%', REVERSE(SUBSTRING(base.value, Pos3 + 1, @MaxLength))), (@MaxLength + 1)), @MaxLength) + Pos3

    ) d(Pos4)

    CROSS APPLY (

    SELECT ISNULL(NULLIF((@MaxLength + 1) - PATINDEX('%'+@Breakers+'%', REVERSE(SUBSTRING(base.value, Pos4 + 1, @MaxLength))), (@MaxLength + 1)), @MaxLength) + Pos4

    ) e(Pos5)

    CROSS APPLY (

    SELECT ISNULL(NULLIF((@MaxLength + 1) - PATINDEX('%'+@Breakers+'%', REVERSE(SUBSTRING(base.value, Pos5 + 1, @MaxLength))), (@MaxLength + 1)), @MaxLength) + Pos5

    ) f(Pos6)

    CROSS APPLY (

    SELECT ISNULL(NULLIF((@MaxLength + 1) - PATINDEX('%'+@Breakers+'%', REVERSE(SUBSTRING(base.value, Pos6 + 1, @MaxLength))), (@MaxLength + 1)), @MaxLength) + Pos6

    ) g(Pos7)

    I don't claim it is the quickest, but it is quite flexible:

    Parameters:

    @MyString : The data to be manipulated

    @MaxLength : The maximum length of each new "chunk" of data

    @Delimiter : The character used to seperate the new "chunks"

    @Breakers : Pattern used to find a suitable break in the data - can be as simple as ' ' or more complicated like '[ .,?!:;)]'

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Here's a recursive CTE method as a iTVF that could work.

    I would like to come up with a none recursive method, but brain fade is setting in and I'm not seeing a way to do it currently :crying:

    One thing to note is that it will break if there is a word longer than the @maxlen specified. The character at that point will be replaced.

    CREATE FUNCTION splitAt(@str varchar(8000), @maxlen int, @splitOn varchar(1), @replaceWith varchar(1)) RETURNS TABLE AS

    RETURN

    WITH rcte as (

    SELECT substring(@str,0,@maxlen - charindex(@spliton,reverse(substring(@str,0,@maxlen)))) s,

    substring(@str,@maxlen - charindex(@spliton,reverse(substring(@str,0,@maxlen))) + 1,8000) r

    UNION ALL

    SELECT substring(r,0,@maxlen - charindex(@spliton,reverse(substring(r,0,@maxlen)))) s,

    substring(r,@maxlen - charindex(@spliton,reverse(substring(r,0,@maxlen))) + 1,8000) r

    FROM rcte

    WHERE len(r) > 0

    )

    SELECT SUBSTRING(S,2,8000) splitResult

    FROM (

    SELECT @replacewith + s AS [text()]

    FROM rCte

    ORDER BY LEN(r) DESC

    FOR XML PATH ('')) A (S)

    On my machine with 10000 address rows with an average length of 55 (max 121, min 21) it took around a second.

  • Viewing 15 posts - 1 through 15 (of 15 total)

    You must be logged in to reply to this topic. Login to reply