Scalar function string replacement

  • Comments posted to this topic are about the item Scalar function string replacement

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • That's really very clever. I can see making some code much cleaner using this instead of nested REPLACE operators.

  • IndyMike wrote:

    That's really very clever. I can see making some code much cleaner using this instead of nested REPLACE operators.

    How?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • This feels almost like an undocumented "feature" (similar to quirky updates)... I know I would be a bit hesitant in how I used this and would probably continue to stick with my nested REPLACE's as I know how those work today and in the future.

    But I did learn something as I expected a different answer on this one than was the actual answer!  Hopefully it is consistent across all SQL versions (tested it on 2017 and it worked as described).  Would suck to implement this only to find the behavior being different on different versions though or having it change in a future version.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Michael L John wrote:

    IndyMike wrote:

    That's really very clever. I can see making some code much cleaner using this instead of nested REPLACE operators.

    How?

    The way I came across this functionality was as a way to remove unacceptable words from user entered content.  Some words should not be written down or shared because it's just filth.   The replacements done are in descending string length order so first 'a--hole' and then 'a--' etc.

    drop function if exists dbo.scalar_string_replace;
    go
    create function dbo.scalar_string_replace(
    @stringnvarchar(max))
    returns nvarchar(max) as
    begin
    select @string = replace(@string, v.str_key, v.str_val)
    from (values (N'idiotic', N'not a good decision maker'),
    (N'idiot', N'fool'),
    (N'fool', N'foot stool')) v(str_key, str_val);

    return @string;
    end
    go

    select dbo.scalar_string_replace(N'You are idiotic and a fool');
    (No column name)
    You are not a good decision maker and a foot stool

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Mr. Brian Gale wrote:

    This feels almost like an undocumented "feature" (similar to quirky updates)... I know I would be a bit hesitant in how I used this and would probably continue to stick with my nested REPLACE's as I know how those work today and in the future.

    But I did learn something as I expected a different answer on this one than was the actual answer!  Hopefully it is consistent across all SQL versions (tested it on 2017 and it worked as described).  Would suck to implement this only to find the behavior being different on different versions though or having it change in a future version.

    It's not a quirk although it's not specifically documented in the MS documentation.  It's sometimes referred to as "overloading a variable" or just "overloading".  This is one of the things that T-SQL does very well where Oracle will give you an error.

    I use this method at work to "normalize" addresses according to postal standards in-so-far as abbreviations go and it's nasty fast for what it does even though it's used in a Scalar Function especially if you use a binary collation.

    Also, and to clarify, the "Nested Replace" method is even faster and can be used in iTVFs.  The reason why I used it in a Scalar Function was to use it in a persisted computed column in the days before I understood that using Scalar Functions in computed columns causes any and all queries to go single threaded even if they don't refer to the column.  See Brent Ozar's article on that subject, which I've individually confirmed as being true and am in the process of replacing.

    https://www.brentozar.com/archive/2020/10/using-triggers-to-replace-scalar-udfs-on-computed-columns/

    It also turns out to be an even bigger problem in other areas, as well... like parallel index rebuilds and DBCC CHECKDB. See Erik Darling's post on that.

    https://www.brentozar.com/archive/2016/01/another-reason-why-scalar-functions-in-computed-columns-is-a-bad-idea/

    And the "recently patched" thing Eric is talking about is for 2012 and 2014 where online rebuilds could sometime cause corruption of your clustered indexes.  Here's the info on that.

    https://support.microsoft.com/en-us/topic/fix-data-corruption-occurs-in-clustered-index-when-you-run-online-index-rebuild-in-sql-server-2012-or-sql-server-2014-075b0c04-1260-1716-6b9d-737ada4f8686

    And people wonder why migrating to 2019 has me concerned even though it's been out for 3 years. 🙁

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's a good solution as long as you recognize that you're creating an internal  Cartesian product between rows in the table value constructor (tvc) and the rows in the outer query. Depending on the number of rows in the outer query and the number of replace values in the tvc, this method may perform poorly compared to other solutions.

    The fact that the function is written as a scalar will mask this fact in the execution plan.

    If you're using SQL Server 2017 or later, the TRANSLATE() function would be a better option.

    • This reply was modified 2 years, 7 months ago by  Jason A. Long.
  • Jason A. Long wrote:

    It's a good solution as long as you recognize that you're creating an internal  Cartesian product between rows in the table value constructor (tvc) and the rows in the outer query. Depending on the number of rows in the outer query and the number of replace values in the tvc, this method may perform poorly compared to other solutions.

    The fact that the function is written as a scalar will mask this fact in the execution plan.

    If you're using SQL Server 2017 or later, the TRANSLATE() function would be a better option.

    Remember that things like DELIMITEDSPLIT8K() and fnTally() also create such Cartesian products not only externally but internally.  ')

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @steve-2 Collins,

    I also mean to say, GREAT question!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Literally just the way it's presented in the question. Either with a scalar function, or in a longer script with a pre-built table of replacements.

  • Jason A. Long wrote:

    If you're using SQL Server 2017 or later, the TRANSLATE() function would be a better option.

    I missed this.  Considering what they did to the FORMAT function for performance, have you or do you know of anyone that has done a performance test?

    The other thing is that, while TRANSLATE and REPLACE do have a bit of an intersection in individual character replacement, TRANSLATE doesn't seem to have the ability to replace combinations of letters as in words, etc.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Mr. Brian Gale wrote:

    This feels almost like an undocumented "feature" (similar to quirky updates)... I know I would be a bit hesitant in how I used this and would probably continue to stick with my nested REPLACE's as I know how those work today and in the future.

    But I did learn something as I expected a different answer on this one than was the actual answer!  Hopefully it is consistent across all SQL versions (tested it on 2017 and it worked as described).  Would suck to implement this only to find the behavior being different on different versions though or having it change in a future version.

    It's not a quirk although it's not specifically documented in the MS documentation.

    It's definitely not documented well.  I recall reading a blog post about it years ago but can't find it now.  It must be intentional functionality because the replacements happen in reverse string length order.  So replacements of short fragments doesn't mess up replacements of full compound words, i.e. 'ass' , 'asshat', and 'asshats' can each be replaced without messing up the longer word with too early of a replacement.  In the case of profanity we replaced the strings with empty ''

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply