Performance issue with tally solution

  • Flo, are you getting your reads from SET STATISTICS IO ON? IIRC UDFs don't report IO correctly in this manner. Try getting them with Profiler.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/1/2009)


    Flo, are you getting your reads from SET STATISTICS IO ON? IIRC UDFs don't report IO correctly in this manner. Try getting them with Profiler.

    Depending on how they are used, aren't In-Line TVF expanded before execution? This particular In-Line TVF doesn't even access any tables, and everything should be small enough that it could (should) easily be done in memory. Please note, I'm not an expert on the internals of SQL Server, in fact I'm still learning a lot.

  • Lynn Pettis (5/1/2009)


    Ah, padawan, set-based solution to populate your table I created, hmm.

    Look for the good in the bad, you may see it.

    Amusing!

    Yes I did get that - but I was hoping to avoid drawing attention to the fact that the CPU and IO involved in setting up my 'custom tally' was separated out from the core of the routine I posted. Including the CPU, even through cunning use of a constant scan, would make it look slightly less impressive :hehe:

    I went on to think how I might expand it to a set-based solution.

    Paul

  • TheSQLGuru (5/1/2009)


    Flo, are you getting your reads from SET STATISTICS IO ON? IIRC UDFs don't report IO correctly in this manner. Try getting them with Profiler.

    Good point. Thanks!

    Just tried with Profiler:

    Description CPU Reads Duration

    ----------- --- ----- --------

    Paul 0 3 0

    Jeff 0 4 0

    Lynn 0 0 0

    ... not kidding

    Greets

    Flo

  • Lynn Pettis (5/1/2009)


    TheSQLGuru (5/1/2009)


    Flo, are you getting your reads from SET STATISTICS IO ON? IIRC UDFs don't report IO correctly in this manner. Try getting them with Profiler.

    Depending on how they are used, aren't In-Line TVF expanded before execution? This particular In-Line TVF doesn't even access any tables, and everything should be small enough that it could (should) easily be done in memory.

    Exactly correct, though I can't think of a time when an in-line TVF wasn't inserted directly into the query plan.

    There may be examples, but I got (0 row(s) affected) from my internal query.

    The biggest cost with UNION ALL constants is compilation time - scale the constant scan up and compilation time can run into the minutes.

  • Dear oh dear.

    I have *just* realized the point of Lynn's solution!

    It avoids the I/O on the custom tally table completely.

    Not my finest moment.

    Luckily it won't be available on the internet for all eternity...oh wait

  • I've just been running some tests on the original problem and hit a curiosity of the Replace function. (I was unsuccessfully trying to get a winning UPDATE solution)

    It is used a lot for getting the number of instances of a substring within a string. God knows, I use it myself for doing line counts. It doesn't work in some instances. I feel sure I'm doing something stupid, but what? (Surely, The result should always have no rows? I get five)

    [font="Courier New"]SELECT [name] FROM

       (

       SELECT TOP(200) o.name, m.definition

       FROM MASTER.sys.all_objects o

          JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID

       WHERE TYPE = 'P'

       ) f

    WHERE --get the count using the Replace trick

       (LEN(definition)- LEN(REPLACE(definition,CHAR(13)+CHAR(10),'')))/2

    <>     --use a tally table

       (SELECT COUNT(*) FROM (SELECT SUBSTRING(definition,number,2)AS 'ch' FROM numbers WHERE number<=LEN(definition)) f

    WHERE ch=CHAR(13)+CHAR(10))

    [/font]

    Best wishes,
    Phil Factor

  • I'll take a look, Paul. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Florian Reischl (5/1/2009)


    TheSQLGuru (5/1/2009)


    Flo, are you getting your reads from SET STATISTICS IO ON? IIRC UDFs don't report IO correctly in this manner. Try getting them with Profiler.

    Good point. Thanks!

    Just tried with Profiler:

    Description CPU Reads Duration

    ----------- --- ----- --------

    Paul 0 3 0

    Jeff 0 4 0

    Lynn 0 0 0

    ... not kidding

    Greets

    Flo

    Ok... I've lost it... what are you measuring here? The code to gen 256 characters?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/1/2009)


    Florian Reischl (5/1/2009)


    TheSQLGuru (5/1/2009)


    Flo, are you getting your reads from SET STATISTICS IO ON? IIRC UDFs don't report IO correctly in this manner. Try getting them with Profiler.

    Good point. Thanks!

    Just tried with Profiler:

    Description CPU Reads Duration

    ----------- --- ----- --------

    Paul 0 3 0

    Jeff 0 4 0

    Lynn 0 0 0

    ... not kidding

    Greets

    Flo

    Ok... I've lost it... what are you measuring here? The code to gen 256 characters?

    Who knows, Jeff. To me it just looks like we went off on some kind of tangent. But then that seems to happen all the time here on SSC, but it generates a lot of interesting conversations and tidbits of knowledge (useful or not is still to be determined).

  • Jeff Moden (5/1/2009)


    Ok... I've lost it... what are you measuring here? The code to gen 256 characters?

    The 'A & B Cleaners' thing you posted.

    It was a hyperbole-ic tangent.

  • Jeff Moden (5/1/2009)


    Florian Reischl (5/1/2009)


    TheSQLGuru (5/1/2009)


    Flo, are you getting your reads from SET STATISTICS IO ON? IIRC UDFs don't report IO correctly in this manner. Try getting them with Profiler.

    Good point. Thanks!

    Just tried with Profiler:

    Description CPU Reads Duration

    ----------- --- ----- --------

    Paul 0 3 0

    Jeff 0 4 0

    Lynn 0 0 0

    ... not kidding

    Greets

    Flo

    Ok... I've lost it... what are you measuring here? The code to gen 256 characters?

    Oups, sorry. :Whistling:

    Just tried the three versions (I removed mine) to clean-up a 256 char string.

    [font="Courier New"]--== Helper table for Pauls version

    ----DROP TABLE dbo.UnwantedChars

    IF (OBJECT_ID('dbo.UnwantedChars') IS NULL)

    BEGIN

       CREATE TABLE dbo.UnwantedChars (BadChar VARCHAR(1) NOT NULL);

       -- Populate a table with the chars we want to remove

       DECLARE   @char_code   INT,

             @char      VARCHAR(1);

       SET      @char_code = -1

       WHILE   @char_code <= 255

       BEGIN

             SELECT   @char_code = @char_code + 1,

                   @char = CHAR(@char_code);

             INSERT   dbo.UnwantedChars(BadChar)

             SELECT   @char

             WHERE   PATINDEX('[^A-Z]', @char) = 1;

       END;

    END

    GO

    --==== Paul ======================

    PRINT '-- Paul''s Version'

    DECLARE @CompanyName VARCHAR(256)

    SET @CompanyName = 'A & B Cleaners';

    UPDATE   dbo.UnwantedChars

    SET      @CompanyName = CONVERT(VARCHAR(256), REPLACE(@CompanyName, BadChar, ''))

    WHERE   CHARINDEX(BadChar, @CompanyName) != 0;

    PRINT   @CompanyName

    GO

    --==== Jeff ======================

    PRINT '-- Jeff''s Version'

    DECLARE @CompanyName VARCHAR(256)

    SELECT @CompanyName = 'A & B Cleaners'

    -- Do the clean-up

    SELECT @CompanyName = STUFF(@CompanyName,PATINDEX('%[^A-Z]%',@CompanyName),1,'')

       FROM dbo.Tally t

      WHERE t.N <= LEN(@CompanyName)

        AND SUBSTRING(@CompanyName,t.N,1) LIKE '[^A-Z]'

    PRINT @CompanyName

    GO

    --==== Lynn ======================

    PRINT '-- Lynn''s Version'

    DECLARE @CompanyName VARCHAR(256)

    SET      @CompanyName = 'A & B Cleaners';

    ; WITH

    Tally2 (N) AS ( SELECT 1 UNION ALL SELECT 2 ),

    Tally4 (N) AS ( SELECT t1.N FROM Tally2 t1, Tally2 t2 ),

    Tally16 (N) AS ( SELECT t1.N FROM Tally4 t1, Tally4 t2 ),

    Tally256 (N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM Tally16 t1, Tally16 t2 )

    SELECT @CompanyName = STUFF(@CompanyName,PATINDEX('%[^A-Z]%',@CompanyName),1,'')

       FROM Tally256 t

      WHERE t.N <= LEN(@CompanyName)

        AND SUBSTRING(@CompanyName,t.N,1) LIKE '[^A-Z]'

    PRINT @CompanyName

    [/font]

    Greets

    Flo

  • Phil Factor (5/1/2009)


    Jeff,

    ...but their developers seem unable to fix things so they work like they used to.

    Ouch.

    They're doing all they can to put in a better solution ASAP. The problem was that it became impossible to improve the browser-side 'prettifier' without a lot of effort (Javascript regex solutions can only do so much) and they are planning on using a vastly better server-side solution. They're working on it urgently at this very moment.

    You can always use my Prettifier in the meantime!

    Heh... understood and sorry for the pork chop. But I sure wouldn't have released it without testing it for all the mistakes the code windows currently enjoy. Why not just put things back the way they were until you can get it hammered out because, right now, doing my own formating or using the prettifier sure does slow down the works.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (5/1/2009)


    Dear oh dear.

    I have *just* realized the point of Lynn's solution!

    It avoids the I/O on the custom tally table completely.

    Not my finest moment.

    Luckily it won't be available on the internet for all eternity...oh wait

    Flo actually used a permutation of Itzik's method. It will quite handily build a billion row table without driving the log file nuts like most every other method will.

    A word of caution though... test it on more than just one string... I've found that it's relatively slow (still fast as hell) compared to some of the other methods until you get up into some of the larger numbers.

    [font="Courier New"]

    DECLARE @BitBucket INT

    DECLARE @TestSize  INT

        SET @TestSize  = 1000000

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100) ---------------------------------------------------------

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    PRINT '===== Matt Miller''s Method ====='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;with cte1 (N) as (select 1 union ALL select 2),  --2

        cte2 as (select c1.N from cte1 c1, cte1 c2, cte1 c3, cte1 c4), --16

        cte3 as (select c1.N from cte2 c1, cte2 c2, cte2 c3, cte2 c4), --65536

        cte4(n) as (select null from cte3 c1,cte2 c2,cte2 c3) 

    select top (@TestSize)

    @Bitbucket = row_number() over (order by N)

    from CTE4

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100) ---------------------------------------------------------

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    PRINT '===== Itzek''s Method ====='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

     ;  WITH 

            L0   AS (SELECT 1 AS C UNION ALL SELECT 1),   --2 rows

            L1   AS (SELECT 1 AS C FROM L0 AS A, L0 AS B),--4 rows

            L2   AS (SELECT 1 AS C FROM L1 AS A, L1 AS B),--16 rows

            L3   AS (SELECT 1 AS C FROM L2 AS A, L2 AS B),--256 rows

            L4   AS (SELECT 1 AS C FROM L3 AS A, L3 AS B),--65536 rows

            L5   AS (SELECT 1 AS C FROM L4 AS A, L4 AS B),--4294967296 rows

            Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY C) AS n FROM L5)

             SELECT top (@TestSize) @BitBucket = N FROM Nums

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100) ---------------------------------------------------------

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    PRINT '===== Jeff Moden''s Method' --Fastest overall...

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    DECLARE @SomeConstant INT

        SET @SomeConstant = 1

    ;  WITH cTally AS

    (-----------------------------------------------------------------------------

     SELECT TOP (@TestSize)

            ROW_NUMBER() OVER (ORDER BY @SomeConstant) AS N

       FROM Master.sys.SysColumns t1 

      CROSS JOIN Master.sys.SysColumns t2  

    )-----------------------------------------------------------------------------

    SELECT @BitBucket = N FROM cTally

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100) ---------------------------------------------------------

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    declare @root int

    select @root = sqrt(@testsize)+1

    PRINT '===== RBarryYoung''s Method'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH cTally AS

    (-----------------------------------------------------------------------------

    SELECT TOP (@TestSize)

            ROW_NUMBER() OVER (ORDER BY t1.ID) AS N

       FROM (Select TOP (@root) ID from Master.sys.SysColumns) t1

      CROSS JOIN (Select TOP (@root) ID from Master.sys.SysColumns) t2

    )-----------------------------------------------------------------------------

    SELECT @BitBucket = N FROM cTally

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100) ---------------------------------------------------------

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    PRINT '===== Combined Method'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    --DECLARE @SomeConstant INT

    --    SET @SomeConstant = 1

    ;  WITH cTally AS

    (-----------------------------------------------------------------------------

     SELECT TOP (@TestSize)

            ROW_NUMBER() OVER (ORDER BY @SomeConstant) AS N

       FROM (Select TOP (@root) ID from Master.sys.SysColumns) t1

      CROSS JOIN (Select TOP (@root) ID from Master.sys.SysColumns) t2

    )-----------------------------------------------------------------------------

    SELECT @BitBucket = N FROM cTally

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF[/font]

    Here's the cleaned up results...

    [font="Courier New"]====================================================================================================

    ===== Matt Miller's Method =====

    SQL Server Execution Times:

    CPU time = 907 ms, elapsed time = 935 ms.

    ====================================================================================================

    ===== Itzek's Method =====

    SQL Server Execution Times:

    CPU time = 906 ms, elapsed time = 1014 ms.

    ====================================================================================================

    ===== Jeff Moden's Method

    SQL Server Execution Times:

    CPU time = 734 ms, elapsed time = 941 ms.

    ====================================================================================================

    ===== RBarryYoung's Method

    SQL Server Execution Times:

    CPU time = 750 ms, elapsed time = 887 ms.

    ====================================================================================================

    ===== Combined Method

    SQL Server Execution Times:

    CPU time = 766 ms, elapsed time = 861 ms.

    [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Phil Factor (5/1/2009)


    I've just been running some tests on the original problem and hit a curiosity of the Replace function. (I was unsuccessfully trying to get a winning UPDATE solution)

    It is used a lot for getting the number of instances of a substring within a string. God knows, I use it myself for doing line counts. It doesn't work in some instances. I feel sure I'm doing something stupid, but what? (Surely, The result should always have no rows? I get five)

    [font="Courier New"]SELECT [name] FROM

       (

       SELECT TOP(200) o.name, m.definition

       FROM MASTER.sys.all_objects o

          JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID

       WHERE TYPE = 'P'

       ) f

    WHERE --get the count using the Replace trick

       (LEN(definition)- LEN(REPLACE(definition,CHAR(13)+CHAR(10),'')))/2

    <>     --use a tally table

       (SELECT COUNT(*) FROM (SELECT SUBSTRING(definition,number,2)AS 'ch' FROM numbers WHERE number<=LEN(definition)) f

    WHERE ch=CHAR(13)+CHAR(10))

    [/font]

    NVARCHAR environment apparently throws LEN off when search for those 2 characters... trying to figure out why...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 361 through 375 (of 522 total)

You must be logged in to reply to this topic. Login to reply