Trying to rewrite RBAR Function

  • Hi all,

    I'm trying to rewrite the following function that we use to trim certain catalog numbers in the company. The business rules are easy, if a catalog contains incorrect characters, we trim them and return a correct string containing only correct characters.

    It is the only way I found of doing this, but it contains a while loop that I do not actually like.

    I'm thinking adding the "RBAR" tag to the title of this article should awake a certain Moden to this article, and see if there is an easy way of doing this.

    By the way, it might stop swarming my profiler with that damn while loop, since it is used against roughly a few million rows.

    IF EXISTS (SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'[dbo].[fnGICSPFParseGICCatalog]')

    AND TYPE IN (N'FN',N'IF',N'TF',N'FS',

    N'FT'))

    DROP FUNCTION [dbo].[fnGICSPFParseGICCatalog]

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE FUNCTION dbo.fnGICSPFParseGICCatalog

    (@string VARCHAR(8000),

    @TrimAll BIT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @IncorrectCharLoc SMALLINT

    IF @TrimAll = 1

    BEGIN

    SET @IncorrectCharLoc = PATINDEX('%[^0-9A-Za-z]%',@string)

    WHILE @IncorrectCharLoc > 0

    BEGIN

    SET @string = STUFF(@string,@IncorrectCharLoc,1,'')

    SET @IncorrectCharLoc = PATINDEX('%[^0-9A-Za-z]%',@string)

    END

    SET @string = @string

    END

    ELSE

    BEGIN

    SET @IncorrectCharLoc = PATINDEX('%[^-0-9A-Za-z&.$/]%',@string)

    WHILE @IncorrectCharLoc > 0

    BEGIN

    SET @string = STUFF(@string,@IncorrectCharLoc,1,'')

    SET @IncorrectCharLoc = PATINDEX('%[^-0-9A-Za-z&.$/]%',@string)

    END

    SET @string = @string

    END

    RETURN @string

    END

    GO

    --=== When we send the TRIMALL Parameter a 1, it means we want to trim all weird characters

    -- that are not numeric or letters ===--

    select dbo.[fnGICSPFParseGICCatalog] ('QO115',1)

    -- Gives QO115

    select dbo.[fnGICSPFParseGICCatalog] ('QO115&-$?',1)

    -- Gives QO115

    --=== When we dont, it will trim only the parameters not in the list. ===--

    select dbo.[fnGICSPFParseGICCatalog] ('QO115&-$?*',0)

    -- Gives QO115&-$ since these 3 characters are permitted in the PAtIndex List.

    -- The ? and * were trimmed from the string to avoid problems with the old Cobol system.

    Thanks all,

    Cheers,

    J-F

  • How about this?

    [font="Courier New"]

    CREATE FUNCTION dbo.fnGICSPFParseGICCatalog_Test(

       @string  VARCHAR(8000),

        @TrimAll BIT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @B VARCHAR(8000)

    SET @B = '' -- Initialize @b-2

    IF @TrimAll = 1

       SELECT @B = @B + SUBSTRING(@String,N,1)

       FROM Tally

       WHERE N <= DATALENGTH(@String)

           AND

               SUBSTRING(@String,N,1) LIKE '%[0-9A-Za-z]%' -- Remove everything but these.

    ELSE

       SELECT @B = @B + SUBSTRING(@String,N,1)

       FROM Tally

       WHERE N <= DATALENGTH(@String)

           AND

               SUBSTRING(@String,N,1) LIKE '%[-0-9A-Za-z&.$/]%' -- Remove everything but these.

    RETURN @B

    END[/font]

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Hi Seth,

    Thanks for the quick reply,

    I've tested the code, and I'm disapointed to see that the code is not performant. It takes twice the time to run against 200 000 rows. (10 secs compared to 5 secs).

    I thought it would improve the speed, but it does not... Any idea?

    Thanks,

    Cheers,

    J-F

  • Unfortunately, not right away, no. I know that normally doing any kind of data access in an inline function is a "bad idea", in the same respect that playing darts in a crowded room while blindfolded and drunk is a bad idea... but for some reason I believed this one was the exception to the rule. I just created a very similar function here for the same purpose but my rowsets are much much smaller and I didn't notice any performance hit.

    I'll have to do some testing on this and see.

    Thanks for the info J-F

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Thanks for the feedback Seth,

    I'm still in need of a quicker function to replace mine, anyone can do it set based? I'll test it to see if it outperforms mine!

    Thanks,

    Cheers,

    J-F

  • I'm surprised to see Jeff did not throw in something, I thought RBAR was the trigger for an answer! πŸ˜‰

    Cheers,

    J-F

  • Jeff mentioned on another thread yesterday that he's been really busy. There's a few he hasn't had time to reply to yet. Give it time :P.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I have found that the best solution for this is to use persisted calculated columns!


    * Noel

  • noeld (1/27/2009)


    I have found that the best solution for this is to use persisted calculated columns!

    Haven't tried those yet, Noel... can you give us a quick "how to" on something like this? I'd seriously like to know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jean-FranΓ§ois Bergeron (1/27/2009)


    I'm surprised to see Jeff did not throw in something, I thought RBAR was the trigger for an answer! πŸ˜‰

    Heh... what... you think I do a search for these things? Nah... I go through 250+ titles a day. Or, I should say "night"... I don't mix my work time with answering questions on a forum... wouldn't be ethical.

    Obviously, I found it... lemme see what's up. I'm also very much looking forward to a response from NoelD... he normally comes up with some good stuff and, I believe, he's been around longer than even me. πŸ™‚

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s. Wrapping the problem up in a couple of "Free Pizza" coupons and wrapping that in a cold case of beer would give me better incentive to look for these things. πŸ˜€

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I actually heard about persisted computed the other day and haven't had a chance to look into exactly how one sets them up yet. I'd be interested to see that too.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. πŸ˜‰

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • ...and Jean-FranΓ§ois came up with one of the very few places where a memory-only While Loop actually squeaks past the Tally table (at least on my machine, it does).

    CREATE FUNCTION dbo.fnGICSPFParseGICCatalog

    (

    @String VARCHAR(8000),

    @TrimAll BIT

    )

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    --===== Declare and preset local variables

    DECLARE @Len INT

    SELECT @Len = LEN(@String)+1

    --===== Remove bad characters according to @TrimAll

    IF @TrimAll = 0

    SELECT @String = STUFF(@String,@Len-t.N,1,'')

    FROM dbo.Tally t

    WHERE t.N < @Len

    AND SUBSTRING(@String,@Len-t.N,1) LIKE '[^0-9A-Z]'

    ORDER BY t.N ASC

    ELSE

    SELECT @String = STUFF(@String,@Len-t.N,1,'')

    FROM dbo.Tally t

    WHERE t.N < @Len

    AND SUBSTRING(@String,@Len-t.N,1) LIKE '[^-0-9A-Z&.$/]'

    ORDER BY t.N ASC

    --===== Return the cleaned value and exit

    RETURN @String

    END

    I've got a pretty old (but ever dependable) 7 year old 1.8Ghz 1GB desktop computer where the while loop and the code above takes between 23 seconds (yours) and 27 seconds (code above) on 200 k rows of a 69 character column to clean instead of the 5 seconds you got on your machine. So, I've gotta ask, why type of machine do you have?

    And, yes, just like your code, the code runs a bit faster with two separate queries and constants for the pattern than it does with a variable for the pattern assigned a value through a decision.

    p.s. Make sure your Tally table has a Clustered Primary Key Constraint on "N".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Can you supply the code necessary to generate the test data you used? I have a quad-core 2.83GHz machine with SQL 2005 DE SP2 that I'd like to test this scenario against, so I can start to learn how to generate test data, as well as start to see how to test these kinds of things. Thanks!

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Well, I got a simple 3.0Ghz Dev machine with 4GB of Ram. It's not that quick, trust me. And I ran the query on 200k rows that has around 30 characters in it.

    Don't worry, the tally has the clustered key on N, πŸ˜‰ Thanks for the advice,

    I'm looking forward to know about persisted calculated columns, but I do not think it will help in this case. This validation is within a huge procedure that checks for vendors price files, and we try to match on various columns, catalog, UPC, VndCatalog. So, we need to generate a "Home" catalog from the VndCatalog, which can have really weird characters at times.

    I wouldn't want to calculate a whole column of GICCatalog from a VndCatalog on a Working table to actually validate the sent data. I was really looking forward a quicker way then a while, but seems it's the best way.

    I've been following your posts, articles, and everything made sense to actually remove those damn cursors or while loops, can you tell me why the loop beats the tally in this case? Do I have to test the cursor, and the tally everytime I develop a new solution, to see which one can be quicker?

    Thanks in advance,

    Cheers,

    J-F

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply