REPLACE Multiple Spaces with One

  • Ok

    1/ modified so will cope with multiple tabs

    2/ -1 issue resolved! [Edit : too much switching between SQL/C# !]

    3/ Performance won't be an issue as I am suggesting this function is called on after update trigger.

    After update trigger would have something like:

    If INSERTED record count > 1 exit trigger.

    Edit

    4/ Changed to use Michaels space system

    Of course I agree with your comments - just trying to find a neat solution to handle clean up of all sorts of junk.

    CREATE FUNCTION dbo.fn_CleanString(@MyString VARCHAR(max))

    RETURNS VarChar(MAX)

    AS

    BEGIN

    -- Remove unwanted strings

    SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)

    FROM tPatterns

    WHERE CHARINDEX(RemovePattern, @MyString) > 0

    RETURN @MyString

    END

    go

    -- Create a table of all the rotten stuff we want to remove

    SELECT 'tablescan' RemovePattern, 'seek' ReplaceWith

    INTO tPatterns

    UNION ALL SELECT 'RBAR', 'R***'

    UNION ALL SELECT 'anyword', 'R*******'

    UNION ALL SELECT CHAR(9), ' ' -- Get rid of all tabs

    UNION ALL SELECT replicate(' ',32+1),' '

    UNION ALL SELECT replicate(' ',16+1),' '

    UNION ALL SELECT replicate(' ',8+1), ' '

    UNION ALL SELECT replicate(' ',4+1), ' '

    UNION ALL SELECT replicate(' ',2+1), ' '

    UNION ALL SELECT replicate(' ',1+1), ' '

    -- Create some dirty test data

    SELECT 'a b c d e f g h rbar i tablescan j k' myString

    INTO tMyStrings

    UNION ALL

    SELECT ' tablescan rbar x b c d e f g h i j k'

    -- and a string with duplicate tabs

    UNION ALL

    SELECT 'A' + REPLICATE(CHAR(9), 3) + 'B' + REPLICATE(CHAR(9), 7) + 'C' + REPLICATE(CHAR(9), 6)

    -- Clean & output the our test data

    SELECT dbo.fn_CleanString(myString) FROM tMyStrings

    go

    -- Cleanup

    DROP TABLE tPatterns

    DROP TABLE tMyStrings

    DROP FUNCTION fn_CleanString

    Allow me a little excitement in my hour of discovery 🙂 I must go sleep now is 1:30 am!

  • C# Screw,

    You are certainly allowed your hour of excitement! Sorry if I come across as beating your code up, it's not intentional - I've been most impressed by the ideas behind your submissions 🙂

    Paul

  • Jeff Moden (11/22/2009)


    Paul White (11/22/2009)


    5. Actually, don't do data access at all in functions!! Unless you really really really know what you are doing, and even then be sure to test it to death!

    Fixed 😉

  • Paul White (11/22/2009)


    C# Screw,

    You are certainly allowed your hour of excitement! Sorry if I come across as beating your code up, it's not intentional - I've been most impressed by the ideas behind your submissions 🙂

    Paul

    Cheers Paul,

    this forum is great, its like working with real cool dudes - you guys really know your stuff.

    Have to say good night from UK now :doze:

  • C# Screw (11/22/2009)


    Of course I agree with your comments - just trying to find a neat solution to handle clean up of all sorts of junk.

    Heh... I'm sure that what I'm about to say will surely knock some folks right out of their chair because everyone that knows me know that I believe that speed/performance is only secondary to accuracy... but...

    Even if your good function turns out to be a bit slower than some other method that may pop up, having such a thing in a function and being a bit slower is a whole lot better than someone trying to solve the problem on their own with a While Loop or worse... they might get things wrong. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (11/22/2009)


    Jeff Moden (11/22/2009)


    Paul White (11/22/2009)


    5. Actually, don't do data access at all in functions!! Unless you really really really know what you are doing, and even then be sure to test it to death!

    Fixed 😉

    LMAO!!! Well done! 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (11/22/2009)


    (which I still admire BTW)

    Heh... The odd thing is, I don't deserve that... like I said in the article, it's an old method and I didn't come up with it. I've just never seen anyone explain it before (kind of like the basis of the Tally table article... everyone said to use it but never explained it). The only thing I did was get rid of a 4th replace and, I'm sure if you look for it, someone probably beat me there, as well.

    I do appreciate the feedback, though... I guess I can start backing out the screws on some of the handrails. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Performance:

    This test run was done having 10 replace strings in the Data Driven table, as in the T-SQL sample above.

    Not really fair to compare as this is the only solution offering multi-pattern replace so far.

    But still it seems quite respectable and time wise is on the line with Jeff original (in a function)..

    New Data Driven solution comes in :

  • Jeff Moden (11/22/2009)


    I'll also agree that nested replaces will still beat the RBAR of doing single replaces even if they are table driven in a single query... we've already seen that in testing.

    NopeNot sure on that one, in the test results I provided, looping is 20% faster than the In-Line Replace in the artcle. BUT I agreee it was only 10k rows with 2GB RAM.

    Jeff Moden (11/22/2009)


    Just in case someone is confused (not you Paul), not all seemingly "loopless" code is Set Based... you also have to look at how many times each row or item is "touched". For example, a "Triangular Join" to do a running total on a lousy 10,000 rows will produce more than 50 MILLION rows of internal RBAR which is why it's so very slow compared to even a Cursor.

    In my opinion every single solution posted on this thread is doing RBAR!

    The only way you will get away from RBAR is by creating a calculated, peristed column of :

    CHARINDEX(SPACE(2),yourcol). Then Create an INDEX on this column.

    Say the column was called 'HasSpace', then adding WHERE HasSpace > 0

    Only then can you say good bye RBAR, as SQL will be able to use the index and you will have the fastest solution possible [Edit: apart from cleaning data on a trigger of course].

    :-P:-)

  • C#Screw

    My compliments on this neat cleanup engine. Here are some comments:

    In the 32+1, 16+1, etc part of the code, you need to add an extra replace of two spaces with one space. See my earlier comments on why this is needed.

    I thought, and Paul mentioned this earlier, that you do not have control over the order the rows are read from a table if there is no explicit order by. With a small table like yours the rows seems to get read in the order you load them. But that's just luck. Thus I added a rownum column which forces the 32+1, 16+1,... strings to be loaded and processed in reverse order. This causes a string like '*'+replicate(' ',7998)+'*' to end up as '*' + 5 spaces + '*'.

    Now for some laughter. I noticed you were trying to clean up obscenities (including RBAR:w00t:). So here is my version of 'obscenity injection' which your code cannot handle.

    Let's say your table wants to remove from all strings any parts containing 'screw you':w00t:. You do this by doing a

    union all 'screw you','' into you cleanup table.

    However, I'm an obscenity obsessed hacker and will insert into your data table the following:

    'scscrew yourew you'

    And here you thought you were all done!:-D

  • Michael Meierruth (11/23/2009)


    'scscrewrew you'

    lol !

    I wonder if could change to LIKE

    then table entry would be '%screw%you%'

    SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)

    FROM tPatterns

    WHERE MyString LIKE RemovePattern

    RETURN @MyString

    🙂

  • C# Screw (11/23/2009)


    Michael Meierruth (11/23/2009)


    'scscrewrew you'

    lol !

    I wonder if could change to LIKE

    then table entry would be '%screw%you%'

    SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)

    FROM tPatterns

    WHERE MyString LIKE RemovePattern

    RETURN @MyString

    🙂

    Actually no that won't work... hey Michael this is a new game altogether!!

    But maybe [Edit: replace string table structure]:

    SearchString | CSVReplaceBits | CSVReplaceBitsWith

    '%screw%you%' | Screw, You | S***,Y**

    Lots more RBAR- ing though to get that to work 🙁

  • Any maintenance screen would have to look after / allow change of the ReplaceOrder column...

    CREATE FUNCTION dbo.fn_CleanString(@MyString VARCHAR(max))

    RETURNS VarChar(MAX)

    AS

    BEGIN

    -- Remove unwanted strings

    SELECT @MyString = REPLACE(@MyString, RemovePattern, ReplaceWith)

    FROM tPatterns

    WHERE CHARINDEX(RemovePattern, @MyString) > 0

    ORDER BY ReplaceOrder

    RETURN @MyString

    END

    go

    -- Create a table of all the rotten stuff we want to remove

    SELECT 'tablescan' RemovePattern, 'seek' ReplaceWith, 1 AS ReplaceOrder

    INTO tPatterns

    UNION ALL SELECT CHAR(9), ' ',2 AS ReplaceOrder -- Get rid of all tabs

    UNION ALL SELECT 'RBAR', 'r***[censored word]',3 AS ReplaceOrder

    UNION ALL SELECT 'screw', 's****[censored word]' ,4 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',32+1),' ',5 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',16+1),' ',6 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',8+1), ' ',7 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',4+1), ' ',8 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',2+1), ' ',9 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',1+1), ' ',10 AS ReplaceOrder

    UNION ALL SELECT replicate(' ',1+1), ' ',11 AS ReplaceOrder

    -- Create some dirty test data

    SELECT 'a b c d e f g h rbar scscrewrew you i tablescan j k' myString

    INTO tMyStrings

    UNION ALL

    SELECT ' tablescan rbar x scscrewrew you b c d e f g h i j k'

    -- and a string with duplicate tabs

    UNION ALL

    SELECT 'A' + REPLICATE(CHAR(9), 3) + 'B' + REPLICATE(CHAR(9), 7) + 'C' + REPLICATE(CHAR(9), 6)

    -- Clean & output the our test data

    SELECT dbo.fn_CleanString(myString) FROM tMyStrings

    go

    -- Cleanup

    DROP TABLE tPatterns

    DROP TABLE tMyStrings

    DROP FUNCTION fn_CleanString

  • Don't know if you noticed, but the TAB replace doesn't seem to work:

    output:

    a b c d e f g h r***[censored word] scs****[censored word]rew you i seek j k

    seek r***[censored word] x scs****[censored word]rew you b c d e f g h i j k

    A---B-------C--- << where --- are spaces, I would have expected single space?

    Any ideas?

  • The problem is the where clause in your function.

    It gets interpreted only once.

Viewing 15 posts - 256 through 270 (of 425 total)

You must be logged in to reply to this topic. Login to reply