how to remove zero from my string

  • tyson.price (8/24/2012)


    I get your point but, that is why we have comments. I will take performance over easy to read any day of the week. Comments are easy to type and can make some incredibly complicated code easy to understand.

    I can't argue that. Do you think this:

    SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)

    out performs this:

    select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')

    I'm also not arguing learning regular expressions helps on the resume. My point is very general and not really specific to anyone in particular here. I don't think anyone really disagrees with what I'm saying. Which is "when all other things are equal, keep it simple".

    I'm coming from an IBM mainframe background, long ago, and when you come in and the middle of the night to fix an issue that someone made complicated just because they could it makes for a longer night. You also don't have time to stop and learn when the on-lines need to be up in a couple of hours.

    I see the same thing in day to day support now that I support Windows applications.

    Comments are great and really help. In most shops I've been in they are as common as documentation:w00t:

    The same type of programmers that generate complicated esoteric code usually make statements like "you shouldn't be programming if you can't read code". Usually they are excellant programmers that have forgotten not everyone is at the same skill level.

    In a business envronment with frequent turnover it is very important to use sructured easy to understand code. I'm not saying inefficient code.

    1. Performance of both will be similar, I think SUBSTRING with PATINDEX, will outperform double-REPLACE in most if not all of cases (REPLACE speed will depend on number of leading and trailing zeros). Will need to be tested on 1,000,000 run at least.

    2. Me too. I also came from Mainframe (Assembler and FORTRAN) long ago... Code which uses RegularExpresssion is way easier to read than "noodles"-like code of recursive CTE and just on a pair with double REPLACE. I cannot see how simple pattern may be more difficult to understand than logic behind REPLACE,LTRIM,REPLACE. '%' - used in LIKE very often by most of T-SQL devs - it's a wildchar, ^ is logical NOT, EXCLUDE etc. Very simple, clearly explained in BoL.

    And at the end, using SUBSTRING and PATINDEX is a perfectly structured and easy to understand code if you know T-SQL (especially in comparison to use of Tally table or even the worse - recursive CTE).

    If you don't know SQL, I wouldn't try to resolve any issue, which sometimes will happen "In a business envronment with frequent turnover"...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • tyson.price (8/24/2012)


    You also don't have time to stop and learn when the on-lines need to be up in a couple of hours.

    Again, I full heartedly agree that's probably a bad time to learn but, if someone is troubleshooting SQL code, then they really should know these simple and very common methods beforehand as well as some of the performance ramifications.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • tyson.price (8/24/2012)


    I get your point but, that is why we have comments. I will take performance over easy to read any day of the week. Comments are easy to type and can make some incredibly complicated code easy to understand.

    I can't argue that. Do you think this:

    SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+']%',@Source),1000)

    out performs this:

    select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')

    I'm also not arguing learning regular expressions helps on the resume. My point is very general and not really specific to anyone in particular here. I don't think anyone really disagrees with what I'm saying. Which is "when all other things are equal, keep it simple".

    I'm coming from an IBM mainframe background, long ago, and when you come in and the middle of the night to fix an issue that someone made complicated just because they could it makes for a longer night. You also don't have time to stop and learn when the on-lines need to be up in a couple of hours.

    I see the same thing in day to day support now that I support Windows applications.

    Comments are great and really help. In most shops I've been in they are as common as documentation:w00t:

    The same type of programmers that generate complicated esoteric code usually make statements like "you shouldn't be programming if you can't read code". Usually they are excellant programmers that have forgotten not everyone is at the same skill level.

    In a business envronment with frequent turnover it is very important to use sructured easy to understand code. I'm not saying inefficient code.

    I think that what is simple is not easy to define.

    It seems that what you find easier to read is different than me. I find the version with PATINDEX easier to read and understand than the other while it seems you find the other one easier to read.

    I think you are correct that few people around here would choose to use complicated code when a more simple version will work. If the shops you work in don't comment, fight the system and do it yourself. Others who look at your code will appreciate it and may even start doing it themselves.

    To demonstrate how simple it can be:

    --Get the substring starting with first occurrence of any character other than @Char

    SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)


    The same type of programmers that generate complicated esoteric code usually make statements like "you shouldn't be programming if you can't read code". Usually they are excellant programmers that have forgotten not everyone is at the same skill level.

    I disagree with this statement entirely. In my experience those who write really complicated and clever code do so to stroke their own ego, or they can't figure out a different way. They tend to be arrogant and do little to help anybody else to unravel their bowl of broken spaghetti. One of the best ways any developer can offer to others is to comment their code. IMHO anything else is lazy and shows a disrespect for the industry and the company they work for in addition to the people who have to follow them.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • tyson.price (8/24/2012)


    Do you think this:

    SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)

    out performs this:

    select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')

    "A Developer must not guess... a Developer must KNOW!" 😉 Test it. Here's the code to build a million row test table for the problem at hand.

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    The code above takes just a second or two to run on a decent machine.... something less than 7 seconds to run on a 10 year old single CPU desktop box.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sean Lange (8/24/2012)


    The same type of programmers that generate complicated esoteric code usually make statements like "you shouldn't be programming if you can't read code". Usually they are excellant programmers that have forgotten not everyone is at the same skill level.

    I disagree with this statement entirely. In my experience those who write really complicated and clever code do so to stroke their own ego, or they can't figure out a different way. They tend to be arrogant and do little to help anybody else to unravel their bowl of broken spaghetti. One of the best ways any developer can offer to others is to comment their code. IMHO anything else is lazy and shows a disrespect for the industry and the company they work for in addition to the people who have to follow them.

    +100000000000000000000000000000

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/24/2012)


    tyson.price (8/24/2012)


    Do you think this:

    SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)

    out performs this:

    select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')

    "A Developer must not guess... a Developer must KNOW!" 😉 Test it. Here's the code to build a million row test table for the problem at hand.

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    The code above takes just a second or two to run on a decent machine.... something less than 7 seconds to run on a 10 year old single CPU desktop box.

    Based on your quote from Sergiy, I ran the following:

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2;

    go

    declare @StartDate datetime2(7) = sysdatetime(),

    @HoldString varchar(128),

    @Char char(1) = '0';

    select

    @HoldString = SUBSTRING(SomeString,PATINDEX('%[^'+@Char+'X]%',SomeString),1000)

    from

    #TestTable;

    declare @EndDate datetime2(7) = sysdatetime();

    select datediff(ms,@StartDate, @EndDate);

    go

    declare @StartDate datetime2(7) = sysdatetime(),

    @HoldString varchar(128),

    @Char char(1) = '0';

    select

    @HoldString = REPLACE(LTRIM(REPLACE(SomeString, '0', ' ')), ' ', '0')

    from

    #TestTable;

    declare @EndDate datetime2(7) = sysdatetime();

    select datediff(ms,@StartDate, @EndDate);

    go

    declare @StartDate datetime2(7) = sysdatetime(),

    @HoldString varchar(128),

    @Char char(1) = '0';

    select

    @HoldString = right(SomeString,datalength(SomeString)-PATINDEX('%[^'+@Char+']%',SomeString)+1)

    from

    #TestTable;

    declare @EndDate datetime2(7) = sysdatetime();

    select datediff(ms,@StartDate, @EndDate);

    go

    drop table #TestTable;

    go

    The double replace is approximately 2 times slower than the substring/patindex and the right/datalength/patindex versions (I added another version to my testing).

    The VM I used has 2 processors x64 processors, 20 GB RAM.

    EDIT: Run using SQL Server 2008 R2 as well.

  • For anybody else still following along I put together a test harness to check this out. Thanks to Jeff for the sample data.

    CREATE FUNCTION RemoveLeftCharPatIndex

    (

    @Source VARCHAR(1000)

    ,@Char CHAR(1)

    )

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    DECLARE @NewValue VARCHAR(1000)

    SELECT @NewValue = SUBSTRING(@Source, PATINDEX('%[^' + @Char + ']%', @Source), 1000)

    return @NewValue

    end

    go

    create FUNCTION RemoveLeftCharTrimReplace

    (

    @Source VARCHAR(1000)

    ,@Char CHAR(1)

    )

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    DECLARE @NewValue VARCHAR(1000)

    SELECT @NewValue = REPLACE(LTRIM(REPLACE(@Source, @Char, ' ')), ' ', @Char)

    return @NewValue

    end

    go

    --generate 1,000,000 rows of test data

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    SET STATISTICS TIME ON

    print 'PatIndex Version'

    select SomeString, dbo.RemoveLeftCharPatIndex(SomeString, '0')

    from #TestTable

    --print '----------------------------------------'

    --print ''

    print 'TrimReplace Version'

    select SomeString, dbo.RemoveLeftCharTrimReplace(SomeString, '0')

    from #TestTable

    SET STATISTICS TIME OFF

    drop table #TestTable

    drop function RemoveLeftCharPatIndex

    drop function RemoveLeftCharTrimReplace

    I ran this several times on my desktop running Win7 Ent with SQL 2008R2 (Not super machine but not horrible either). It seems that the PATINDEX has a slight edge on performance.

    Here are my results:

    PatIndex Version

    SQL Server Execution Times:

    CPU time = 5039 ms, elapsed time = 12791 ms.

    -----------------------------------

    TrimReplace Version

    SQL Server Execution Times:

    CPU time = 5881 ms, elapsed time = 12651 ms.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Jeff Moden (8/24/2012)


    tyson.price (8/24/2012)


    Do you think this:

    SELECT SUBSTRING(@Source,PATINDEX('%[^'+@Char+'X]%',@Source),1000)

    out performs this:

    select REPLACE(LTRIM(REPLACE('000temp001', '0', ' ')), ' ', '0')

    "A Developer must not guess... a Developer must KNOW!" 😉 Test it. Here's the code to build a million row test table for the problem at hand.

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    The code above takes just a second or two to run on a decent machine.... something less than 7 seconds to run on a 10 year old single CPU desktop box.

    Sorry Jeff, Right now I'm on PC without SQL Server installed, so can only guess.

    If you can run quick test it would be great.

    Again, saying that, SUBSTRING and PATINDEX performance should be quite constant and number of leading characters to remove will be irrelevant. REPLACE will depend on how many leading and trailing zeros are in the string.

    Not saying that SUBSTRING & PATINDEX method is much more flexible in what kind of leading things it can find and remove (again the example of removing any leading digit '%[^0-9]%')...

    Oops, looks like the tests were run! Thanks!

    As I rightly guessed... 🙂

    And I'm sure than more zero will be added at front and at the end of string it will be even greater difference.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • The following code on my laptop running SQL Server 2005, 8 GB RAM, 8 cores, x64 processor:

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2;

    go

    declare @StartDate datetime,

    @HoldString varchar(128),

    @Char char(1);

    SET @StartDate = GETDATE();

    SET @Char = '0';

    select

    @HoldString = SUBSTRING(SomeString,PATINDEX('%[^'+@Char+'X]%',SomeString),1000)

    from

    #TestTable;

    declare @EndDate datetime;

    SET @EndDate = GETDATE();

    select datediff(ms,@StartDate, @EndDate);

    go

    declare @StartDate datetime,

    @HoldString varchar(128),

    @Char char(1);

    SET @StartDate = GETDATE();

    SET @Char = '0';

    select

    @HoldString = REPLACE(LTRIM(REPLACE(SomeString, '0', ' ')), ' ', '0')

    from

    #TestTable;

    declare @EndDate datetime;

    SET @EndDate = GETDATE();

    select datediff(ms,@StartDate, @EndDate);

    go

    declare @StartDate datetime,

    @HoldString varchar(128),

    @Char char(1);

    SET @StartDate = GETDATE();

    SET @Char = '0';

    select

    @HoldString = right(SomeString,datalength(SomeString)-PATINDEX('%[^'+@Char+']%',SomeString)+1)

    from

    #TestTable;

    declare @EndDate datetime;

    SET @EndDate = GETDATE();

    select datediff(ms,@StartDate, @EndDate);

    go

    drop table #TestTable;

    go

    The substring method was still about 2 times faster than the multiple replace and about 1.5 times faster than my right() solution.

    Really comes down to it depends on the system that you are running on which is better. The best thing to do here is test, test, and test again.

  • Don't trust me, but using Lynn's code to test performance with more leading zeros (100 more) the results are consistent.

    All the methods took longer to run and the double REPLACE took twice as long as the other two. SUBSTRING and RIGHT seem to have similar results for elapsed time.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Sean Lange (8/24/2012)


    For anybody else still following along I put together a test harness to check this out. Thanks to Jeff for the sample data.

    CREATE FUNCTION RemoveLeftCharPatIndex

    (

    @Source VARCHAR(1000)

    ,@Char CHAR(1)

    )

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    DECLARE @NewValue VARCHAR(1000)

    SELECT @NewValue = SUBSTRING(@Source, PATINDEX('%[^' + @Char + ']%', @Source), 1000)

    return @NewValue

    end

    go

    create FUNCTION RemoveLeftCharTrimReplace

    (

    @Source VARCHAR(1000)

    ,@Char CHAR(1)

    )

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    DECLARE @NewValue VARCHAR(1000)

    SELECT @NewValue = REPLACE(LTRIM(REPLACE(@Source, @Char, ' ')), ' ', @Char)

    return @NewValue

    end

    go

    --generate 1,000,000 rows of test data

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    SET STATISTICS TIME ON

    print 'PatIndex Version'

    select SomeString, dbo.RemoveLeftCharPatIndex(SomeString, '0')

    from #TestTable

    --print '----------------------------------------'

    --print ''

    print 'TrimReplace Version'

    select SomeString, dbo.RemoveLeftCharTrimReplace(SomeString, '0')

    from #TestTable

    SET STATISTICS TIME OFF

    drop table #TestTable

    drop function RemoveLeftCharPatIndex

    drop function RemoveLeftCharTrimReplace

    I ran this several times on my desktop running Win7 Ent with SQL 2008R2 (Not super machine but not horrible either). It seems that the PATINDEX has a slight edge on performance.

    Here are my results:

    PatIndex Version

    SQL Server Execution Times:

    CPU time = 5039 ms, elapsed time = 12791 ms.

    -----------------------------------

    TrimReplace Version

    SQL Server Execution Times:

    CPU time = 5881 ms, elapsed time = 12651 ms.

    My recommendation is to never use SET STATISTICS to measure code when scalar functions are involved. Please see the following article for why...

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    That same article also shows how to convert scalar UDFs to higher performance "iSFs" (Inline Scalar Functions).

    Also, returning data to the display is the "great equalizer". Instead, dump the results to a throw away variable. When I get home, I'll show you what I mean.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/24/2012)


    Sean Lange (8/24/2012)


    For anybody else still following along I put together a test harness to check this out. Thanks to Jeff for the sample data.

    CREATE FUNCTION RemoveLeftCharPatIndex

    (

    @Source VARCHAR(1000)

    ,@Char CHAR(1)

    )

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    DECLARE @NewValue VARCHAR(1000)

    SELECT @NewValue = SUBSTRING(@Source, PATINDEX('%[^' + @Char + ']%', @Source), 1000)

    return @NewValue

    end

    go

    create FUNCTION RemoveLeftCharTrimReplace

    (

    @Source VARCHAR(1000)

    ,@Char CHAR(1)

    )

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    DECLARE @NewValue VARCHAR(1000)

    SELECT @NewValue = REPLACE(LTRIM(REPLACE(@Source, @Char, ' ')), ' ', @Char)

    return @NewValue

    end

    go

    --generate 1,000,000 rows of test data

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    SET STATISTICS TIME ON

    print 'PatIndex Version'

    select SomeString, dbo.RemoveLeftCharPatIndex(SomeString, '0')

    from #TestTable

    --print '----------------------------------------'

    --print ''

    print 'TrimReplace Version'

    select SomeString, dbo.RemoveLeftCharTrimReplace(SomeString, '0')

    from #TestTable

    SET STATISTICS TIME OFF

    drop table #TestTable

    drop function RemoveLeftCharPatIndex

    drop function RemoveLeftCharTrimReplace

    I ran this several times on my desktop running Win7 Ent with SQL 2008R2 (Not super machine but not horrible either). It seems that the PATINDEX has a slight edge on performance.

    Here are my results:

    PatIndex Version

    SQL Server Execution Times:

    CPU time = 5039 ms, elapsed time = 12791 ms.

    -----------------------------------

    TrimReplace Version

    SQL Server Execution Times:

    CPU time = 5881 ms, elapsed time = 12651 ms.

    My recommendation is to never use SET STATISTICS to measure code when scalar functions are involved. Please see the following article for why...

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    That same article also shows how to convert scalar UDFs to higher performance "iSFs" (Inline Scalar Functions).

    Also, returning data to the display is the "great equalizer". Instead, dump the results to a throw away variable. When I get home, I'll show you what I mean.

    Like I did in my code?

  • Lynn Pettis (8/24/2012)


    Like I did in my code?

    Exactly. I was going to comment on that but ran out of time. Nice going, Lynn.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry for the delay with the coded response.

    Here's what I was talking about. Currently, there's no such thing as an "Inline Scalar Function" (iSF for short) but you can make one that comes real close using an iTVF or "Inline Table Valued Function" to return a scalar value.

    Here's the self contained code that includes the 2 functions tested so far and a function to emulate an iSF. The difference in performance is pretty large if you consider that the first function and the iSF do identical things.

    RAISERROR('Setting up the test environment...',0,1) WITH NOWAIT;

    --=======================================================================================

    -- Conditionally drop the functions and use a nice safe place that everyone has.

    --=======================================================================================

    --===== Identify the database to use.

    USE tempdb;

    --===== Conditionally drop all of the functions to make reruns in SSMS easier.

    IF OBJECT_ID('tempdb.dbo.RemoveLeftCharPatIndex') IS NOT NULL

    DROP FUNCTION dbo.RemoveLeftCharPatIndex

    ;

    IF OBJECT_ID('tempdb.dbo.RemoveLeftCharTrimReplace') IS NOT NULL

    DROP FUNCTION dbo.RemoveLeftCharTrimReplace

    ;

    IF OBJECT_ID('tempdb.dbo.isfRemoveLeadingCharacter') IS NOT NULL

    DROP FUNCTION dbo.isfRemoveLeadingCharacter

    ;

    GO

    --=======================================================================================

    -- Conditionally drop the functions and use a nice safe place that everyone has.

    --=======================================================================================

    CREATE FUNCTION dbo.RemoveLeftCharPatIndex

    (

    @Source VARCHAR(1000),

    @Char CHAR(1)

    )

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    DECLARE @NewValue VARCHAR(1000)

    SELECT @NewValue = SUBSTRING(@Source, PATINDEX('%[^' + @Char + ']%', @Source), 1000)

    RETURN @NewValue

    END

    ;

    GO

    CREATE FUNCTION dbo.RemoveLeftCharTrimReplace

    (

    @Source VARCHAR(1000),

    @Char CHAR(1)

    )

    RETURNS VARCHAR(1000)

    AS

    BEGIN

    DECLARE @NewValue VARCHAR(1000)

    SELECT @NewValue = REPLACE(LTRIM(REPLACE(@Source, @Char, ' ')), ' ', @Char)

    RETURN @NewValue

    END

    ;

    GO

    --===== This is like the RemoveLeftCharPatIndex funtion except it's an "Inline Scalar Function".

    CREATE FUNCTION dbo.isfRemoveLeadingCharacter

    (

    @SomeString VARCHAR(8000),

    @RemoveChar CHAR(1)

    )

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN SELECT CleanString = SUBSTRING(@SomeString,PATINDEX('%[^'+@RemoveChar+']%',@SomeString),8000)

    ;

    GO

    --=======================================================================================

    -- Conditionally drop and rebuild the test data.

    --=======================================================================================

    --===== Conditionally drop the test table to make reruns in SSMS easier.

    IF OBJECT_ID('tempdb..#TestTable') IS NOT NULL

    DROP TABLE #TestTable

    ;

    --===== Create and populate the test with 1,000,000 rows of test data

    SELECT TOP 1000000

    SomeString = REPLICATE('0',ABS(CHECKSUM(NEWID()))%5) -- 0 to 4 leading zeros

    + 'temp'+RIGHT('0000000'+CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS VARCHAR(10)),7)

    INTO #TestTable

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    --=======================================================================================

    -- Run the tests using a duration timer because SET STATISTICS TIME ON induces

    -- large delay times because of the RBAR nature of Scalar UDFs.

    --=======================================================================================

    --===== Declare some obviously named variables.

    DECLARE @Bitbucket VARCHAR(8000),

    @StartTime DATETIME,

    @DurationMS INT

    ;

    --===== Run the tests on each function

    RAISERROR('=======================================',0,1) WITH NOWAIT;

    RAISERROR('============ Running Tests ============',0,1) WITH NOWAIT;

    RAISERROR('=======================================',0,1) WITH NOWAIT;

    -- Note that @Bitbucket allows us to take display and disk time out of the equation.

    -----------------------------------------------------------------------------------------

    SELECT @StartTime = GETDATE();

    RAISERROR('============ PatIndex Version',0,1) WITH NOWAIT;

    SELECT @Bitbucket = dbo.RemoveLeftCharPatIndex(SomeString, '0')

    FROM #TestTable;

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE())

    RAISERROR('Duration ms: %u',0,1,@DurationMS) WITH NOWAIT;

    -----------------------------------------------------------------------------------------

    SELECT @StartTime = GETDATE();

    RAISERROR('============ TrimReplace Version',0,1) WITH NOWAIT;

    SELECT @Bitbucket = dbo.RemoveLeftCharTrimReplace(SomeString, '0')

    FROM #TestTable;

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE())

    RAISERROR('Duration ms: %u',0,1,@DurationMS) WITH NOWAIT;

    -----------------------------------------------------------------------------------------

    SELECT @StartTime = GETDATE();

    RAISERROR('============ isf Substring/PatIndex',0,1) WITH NOWAIT;

    SELECT @Bitbucket = ca.CleanString

    FROM #TestTable tt

    CROSS APPLY dbo.isfRemoveLeadingCharacter(tt.SomeString,'0') ca;

    SELECT @DurationMS = DATEDIFF(ms,@StartTime,GETDATE())

    RAISERROR('Duration ms: %u',0,1,@DurationMS) WITH NOWAIT;

    -----------------------------------------------------------------------------------------

    RAISERROR('======================================',0,1) WITH NOWAIT;

    RAISERROR('============ RUN COMPLETE ============',0,1) WITH NOWAIT;

    RAISERROR('======================================',0,1) WITH NOWAIT;

    Here are the run results from my ten year old single 32 bit 1.8GHz CPU desktop box...

    Setting up the test environment...

    (1000000 row(s) affected)

    =======================================

    ============ Running Tests ============

    =======================================

    ============ PatIndex Version

    Duration ms: 10793

    ============ TrimReplace Version

    Duration ms: 12843

    ============ isf Substring/PatIndex

    Duration ms: 1466

    ======================================

    ============ RUN COMPLETE ============

    ======================================

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply