REPLACE Multiple Spaces with One

  • Michael Meierruth (11/23/2009)


    The problem is the where clause in your function.

    It gets interpreted only once.

    It took me a while to realise why - I can see now though that the only replace rows that are included are those where there is a charindex hit on the original string. The where clause not being re-evaluated.

    If I add spaces to the data with tabs in :

    SELECT 'A' + SPACE(10)+ REPLICATE(CHAR(9), 3) etc

    [Edit : changed spaces to SPACE() so you see in post]

    Then the tab string is cleaned up ok - as the original string matches the 'space replace' records.

    So now I understand, thankyou!!

    Shame really as I added the WHERE clause to try and cut down on workload...

    Thanks

  • Just put WHERE clause back:-) and added tabs to your pattern, of course!!:

    CREATE FUNCTION dbo.fn_CleanString(@MyString VARCHAR(max))

    RETURNS VarChar(MAX)

    AS

    BEGIN

    -- Remove unwanted strings

    SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)

    FROM tPatterns

    WHERE CHARINDEX(RemovePattern, @MyString) > 0

    ORDER BY ReplaceOrder

    RETURN @MyString

    END

    go

    -- Create a table of all the rotten stuff we want to remove

    SELECT 'tablescan' RemovePattern, 'seek' ReplaceWith, 1 AS ReplaceOrder

    INTO tPatterns

    UNION ALL SELECT 'RBAR', 'r***[censored word]',3 AS ReplaceOrder

    UNION ALL SELECT 'screw', 's***[censored word]' ,4 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',32+1),' ',5 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',16+1),' ',6 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',8+1), ' ',7 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',4+1), ' ',8 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',2+1), ' ',9 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',1+1), ' ',10 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',1+1), ' ',11 AS ReplaceOrder

    UNION ALL SELECT replicate(CHAR(9),32+1),CHAR(9),12 AS ReplaceOrder

    UNION ALL SELECT replicate(CHAR(9),16+1),CHAR(9),13 AS ReplaceOrder

    UNION ALL SELECT replicate(CHAR(9),8+1), CHAR(9),14 AS ReplaceOrder

    UNION ALL SELECT replicate(CHAR(9),4+1), CHAR(9),15 AS ReplaceOrder

    UNION ALL SELECT replicate(CHAR(9),2+1), CHAR(9),16 AS ReplaceOrder

    UNION ALL SELECT replicate(CHAR(9),1+1), CHAR(9),17 AS ReplaceOrder

    UNION ALL SELECT replicate(CHAR(9),1+1), CHAR(9),18 AS ReplaceOrder

    -- Create some dirty test data

    SELECT 'a b c d e f g h rbar rbar scscrewrew you i tablescan j k' myString

    INTO tMyStrings

    UNION ALL

    SELECT ' tablescan rbar x scscrewrew you b c d e f g h i j k'

    -- and a string with duplicate tabs

    UNION ALL

    SELECT 'A' + REPLICATE(CHAR(9), 3) + 'B' + REPLICATE(CHAR(9), 7) + 'C' + REPLICATE(CHAR(9), 6)

    -- Clean & output the our test data

    SELECT dbo.fn_CleanString(myString) FROM tMyStrings

    go

    -- Cleanup

    DROP TABLE tPatterns

    DROP TABLE tMyStrings

    DROP FUNCTION dbo.fn_CleanString

    Edit : I found a text file of bad words that could be downloaded and inserted:

    bad words (wordpress.org)

    please dont follow the link if you are easliy offended!

    Enjoy;-)

  • Jeff Moden (11/22/2009)


    C# Screw (11/22/2009)


    :satisfied:!!!!Wait a minute - I thought I was onto something!!!!!!:satisfied:

    This works nicely without the Nested replace:-):-):

    Very cool revelation but I'd suggest using the same spacing scheme that Michael used for two reasons... it's faster and it will handle up to 65536 spaces.

    As a side bar, it's interesting that some folks in my group at work recently received some 3rd party files with (you won't believe this) a freakin' million spaces between some words. That's not an exaggeration... there were over 1 million contiguous spaces in several spots in the file. I'm thinking "road trip" to find out in person if those kids like pork chops. 😛

    Jeff,

    If you read my comments on http://www.sqlservercentral.com/Forums/Topic819042-203-15.aspx you'll find out that, by just increasing to 64+1 the binary stack of spaces, you can handle almost 10 million spaces.

    Those kids might like pork chops but they know nothing about the power of binary approaches to quickly clean up their mess. If this is not enough you can always move to 128+1 which will handle a string of spaces 1.27GB big.

  • Understood but thanks for the reminder...

    The real problem is that this type of thing should never happen... it's simple and easy to prevent. Data providers that cough up that kind of garbage should be forced to eat their own computers. (Just venting a little).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/23/2009)


    Data providers that cough up that kind of garbage should be forced to eat their own computers. (Just venting a little).

    Without apple sauce 😎

  • Paul White (11/23/2009)


    Jeff Moden (11/23/2009)


    Data providers that cough up that kind of garbage should be forced to eat their own computers. (Just venting a little).

    Without apple sauce 😎

    With Chips 😀

  • DECLARE @STRVARCHAR(200)

    SELECT

    @STR ='Some day I will know everything. I hope that day never comes.'

    select @STR

    select @STR=REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')

    SELECT @STR

    Regards,
    Mitesh OSwal
    +918698619998

  • DECLARE @STR VARCHAR(1000)

    SET @STR= 'ALKSDKLKJ LSJD ASD LSD S D DJD D D D D D D'

    select @STR

    select @STR=REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')

    SELECT @STR

    Regards,
    Mitesh OSwal
    +918698619998

  • Interesting idea. And it's collation insensitive too.

    But this approach will lose data!

    Thus try:

    DECLARE @STR VARCHAR(8000)

    set @STR='*' + replicate(' ',7998) + '*'

    select @STR

    select @STR = REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')

    SELECT @STR

    select len(@str)

    It loses my trailing asterisk.

  • Michael Meierruth (11/24/2009)


    Interesting idea. And it's collation insensitive too.

    But this approach will lose data!

    Thus try:

    DECLARE @STR VARCHAR(8000)

    set @STR='*' + replicate(' ',7998) + '*'

    select @STR

    select @STR = REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')

    SELECT @STR

    select len(@str)

    It loses my trailing asterisk.

    Mind you in 2005 VarChar(max) it would seem pretty cool..

    Send it down to Performance Lab?

  • Could be used to (maybe) speed up Generic Cleanup function - but WHERE clause has to be excluded so may offset any gain

    CREATE FUNCTION dbo.fn_CleanString(@MyString VARCHAR(max))

    RETURNS VarChar(MAX)

    AS

    BEGIN

    -- Remove unwanted strings

    SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)

    FROM tPatterns

    --WHERE CHARINDEX(RemovePattern, @MyString) > 0

    ORDER BY ReplaceOrder

    RETURN @MyString

    END

    go

    -- Create a table of all the rotten stuff we want to remove

    SELECT 'tablescan' RemovePattern, 'seek' ReplaceWith, 1 AS ReplaceOrder

    INTO tPatterns

    UNION ALL SELECT 'RBAR', 'r***[censored word]',3 AS ReplaceOrder

    UNION ALL SELECT 'screw', 's***[censored word]' ,4 AS ReplaceOrder

    UNION ALL SELECT ' ','$#$',5 AS ReplaceOrder -- Space cleanup

    UNION ALL SELECT '$$#','',6 AS ReplaceOrder

    UNION ALL SELECT '$#$', ' ',7 AS ReplaceOrder

    UNION ALL SELECT CHAR(9),'$#$',8 AS ReplaceOrder -- TAB cleanup

    UNION ALL SELECT '$$#','',9 AS ReplaceOrder

    UNION ALL SELECT '$#$', CHAR(9),10 AS ReplaceOrder

    -- Create some dirty test data

    SELECT 'a b c d e f g h rbar rbar scscrewrew you i tablescan j k' myString

    INTO tMyStrings

    UNION ALL

    SELECT ' tablescan rbar x scscrewrew you b c d e f g h i j k'

    -- and a string with duplicate tabs

    UNION ALL

    SELECT 'A' + REPLICATE(CHAR(9), 3) + 'B' + REPLICATE(CHAR(9), 7) + 'C' + REPLICATE(CHAR(9), 6)

    -- Clean & output the our test data

    SELECT dbo.fn_CleanString(myString) FROM tMyStrings

    go

    -- Cleanup

    DROP TABLE tPatterns

    DROP TABLE tMyStrings

    DROP FUNCTION dbo.fn_CleanString

  • Very strange this MAX stuff on SS2005. The code below behaves very strange. I.e. I set a string to 8004 characters and yet the LEN function shows it to be 8002.

    But the code to remove the extra blanks correctly returns * *.

    Strange indeed!

    declare @STR varchar(max)

    declare @s2 varchar(max)

    set @s2 = replicate(' ',8002)

    set @STR = '*' + @s2 + '*'

    select @STR

    select len(@Str)

    select @STR = REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')

    select @STR

    select len(@Str)

  • Michael Meierruth (11/24/2009)


    Very strange this MAX stuff on SS2005. The code below behaves very strange. I.e. I set a string to 8004 characters and yet the LEN function shows it to be 8002.

    But the code to remove the extra blanks correctly returns * *.

    Strange indeed!

    declare @STR varchar(max)

    declare @s2 varchar(max)

    set @s2 = replicate(' ',8002)

    set @STR = '*' + @s2 + '*'

    select @STR

    select len(@Str)

    select @STR = REPLACE(REPLACE(REPLACE(@Str,' ','$#$'),'$$#',''),'$#$',' ')

    select @STR

    select len(@Str)

    hmmm and this shows length of 0!

    declare @s2 varchar(max)

    set @s2 = replicate(space(1),8002)

    SELECT LEN (@s2)

  • len() removes trailing spaces. Use Datalength()

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • jcrawf02 (11/24/2009)


    len() removes trailing spaces. Use Datalength()

    solves mine - but not Michaels observation

Viewing 15 posts - 271 through 285 (of 425 total)

You must be logged in to reply to this topic. Login to reply