INSERT INTO with tally is slower than INSERT INTO within WHILE LOOP

  • My colleague wrote a following function to generate a table with 10,000,000 records, which works absolutely fine:

    DROP FUNCTION IF EXISTS [dbo].[get_generate_sequence_fn];
    GO
    create FUNCTION [dbo].[get_generate_sequence_fn](
    @sequence_start INT,
    @sequence_end INT,
    @incremental_value INT
    )

    RETURNS @t_genarated_values TABLE (
    value INT
    )
    AS

    BEGIN

    WHILE ( @sequence_start <= @sequence_end)
    BEGIN
    INSERT INTO @t_genarated_values
    select @sequence_start
    set @sequence_start+=@incremental_value

    END
    RETURN;
    END;

    I suggested to change it to use tally, assuming it will be significantly faster, but to my surprise the execution plan for 10M records shows it is actually slower. Can one explain me why, please?  Apologies, I cannot paste the screenshots of the execution plans for both.

    DROP FUNCTION IF EXISTS [dbo].[get_generate_sequence_tally_fn];
    GO
    create FUNCTION [dbo].[get_generate_sequence_tally_fn](
    @i_sequence_start INT,
    @i_sequence_end INT,
    @i_inc_val INT
    )


    RETURNS @t_genarated_values TABLE (
    value INT
    )
    AS
    BEGIN


    WITH cte_tally (n) AS
    (
        
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
        FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n) -- tally table for 400 records
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n) -- tally table for 8000 records
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n) -- tally table for 160,000 records
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n) -- tally table for 3,200,000 records
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f(n) -- tally table for 3,200,000 records
    )
    INSERT INTO @t_genarated_values
    SELECT @i_sequence_start + @i_inc_val*(n-1) 
    FROM cte_tally
    WHERE n<= CEILING((@i_sequence_end - @i_sequence_start+1)/@i_inc_val)


    RETURN;
    END;

     

     

  • Out of interest, can you try this version?

    INSERT ...
    SELECT TOP((@i_sequence_end - @i_sequence_start + 1) / @i_inc_val)
    value = (ROW_NUMBER () OVER (ORDER BY(SELECT NULL))) * @i_inc_val - 1 + @i_sequence_start
    FROM sys.all_columns ac1
    CROSS JOIN sys.all_columns ac2;

    Note also that you have misspelled 'generated'.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Thank you, Phil - the execution plan is the same. And same as with my tally table, 94% of it goes on INSERT INTO @t table

  • BOR15K wrote:

    Thank you, Phil - the execution plan is the same. And same as with my tally table, 94% of it goes on INSERT INTO @t table

    You don't need to insert into a table variable:

    DROP FUNCTION [dbo].[get_generate_sequence_tally_fn];
    GO
    create FUNCTION [dbo].[get_generate_sequence_tally_fn](
    @i_sequence_start INT,
    @i_sequence_end INT,
    @i_inc_val INT
    )

    RETURNS TABLE AS
    RETURN WITH cte_tally (n) AS
    (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n) -- tally table for 400 records
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n) -- tally table for 8000 records
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d(n) -- tally table for 160,000 records
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) e(n) -- tally table for 3,200,000 records
    CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) f(n) -- tally table for 3,200,000 records
    )
    SELECT TOP(CEILING((@i_sequence_end - @i_sequence_start+1)/@i_inc_val)) @i_sequence_start + @i_inc_val*(n-1) value
    FROM cte_tally
    ;
    GO
  • Thank you, Jonathan. Almost every step in the execution plan now marked as "No Join Predicate" and the whole function now costs 100% with my original tally one costing 0%

  • BOR15K wrote:

    Thank you, Jonathan. Almost every step in the execution plan now marked as "No Join Predicate" and the whole function now costs 100% with my original tally one costing 0%

    So what?  How does it perform?

    The reason your original took 0% is because execution plans for a WHILE loop are based only on the first iteration of the WHILE loop.  Stop using % of batch as a performance indicator.  It's based on estimates and not reality.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    BOR15K wrote:

    Thank you, Jonathan. Almost every step in the execution plan now marked as "No Join Predicate" and the whole function now costs 100% with my original tally one costing 0%

    So what?  How does it perform?

    The reason your original took 0% is because execution plans for a WHILE loop are based only on the first iteration of the WHILE loop.  Stop using % of batch as a performance indicator.  It's based on estimates and not reality. 

    The original while loop is appalling compared to a tally without inserting into, here is a test of 10 million rows of the while loop vs. a tally with no inserts:

    SET STATISTICS IO, TIME ON
    go
    SELECT *
    into #t1
    FROM [dbo].[get_generate_sequence_fn] (1, 10000000, 1)
    ;

    go
    SELECT *
    into #t2
    FROM [dbo].[get_generate_sequence_tally_fn](1, 10000000, 1)
    ;
    SQL Server parse and compile time: 
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.
    Table '#B21ED994'. Scan count 1, logical reads 16892, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 406485 ms, elapsed time = 848383 ms.

    (10000000 rows affected)
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 5 ms.

    SQL Server Execution Times:
    CPU time = 3997 ms, elapsed time = 1671 ms.

    (10000000 rows affected)

    So it is over 500 times faster using a tally table with no inserts: 1.7 seconds compared to 14 minutes.

  • times on a I7 6 core laptop

    according to explan plan query cost (relative to batch= are 0, 0, 100% respectively  - so DEFINITELY NOT something to use.

    - with while loop - [get_generate_sequence_fn]

    Table '#BF78D196'. Scan count 1, logical reads 16892, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1215937 ms, elapsed time = 1273212 ms.

    (10000000 rows affected)

    -- with @table insert - [get_generate_sequence_tally_fn]

    Table '#BC9C64EB'. Scan count 1, logical reads 16078, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 34281 ms, elapsed time = 34503 ms.

    (10000000 rows affected)

    -- ITVF - [get_generate_sequence_tally_fn_itvf]

    SQL Server Execution Times:

    CPU time = 10671 ms, elapsed time = 8476 ms.

    (10000000 rows affected)

  • Thank you all for the valuable replies, but I think this discussion diverted from my original question: why my tally table's function had worse execution plan than my colleague's simple WHILE LOOP. Jonathan above has already advised this is because the execution plan is based on only one iteration of the WHILE LOOP. Is it documented somewhere, so I can share it across our team, please?

    And yes, my tally works MUCH faster than the loop.

     

  • BOR15K wrote:

    Thank you all for the valuable replies, but I think this discussion diverted from my original question: why my tally table's function had worse execution plan than my colleague's simple WHILE LOOP. Jonathan above has already advised this is because the execution plan is based on only one iteration of the WHILE LOOP. Is it documented somewhere, so I can share it across our team, please?

    And yes, my tally works MUCH faster than the loop.

    Just to make the minor correction, I was the one to talk about the single iteration.  Although I've not done an extensive search for it, I don't believe there's any Microsoft Documentation on that subject.

    @Grant Fritchey has written several books on the subject.  He might know where MS documented it (if it is documented) but it doesn't seem like something that MS would document because I don't believe they actually know either.

    As a bit of a sidebar, does anyone know where the documentation is for "Query Cost (relative to the batch)" documentation for execution plans is?  I've done a couple of quick searches and can't find even that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    BOR15K wrote:

    Thank you all for the valuable replies, but I think this discussion diverted from my original question: why my tally table's function had worse execution plan than my colleague's simple WHILE LOOP. Jonathan above has already advised this is because the execution plan is based on only one iteration of the WHILE LOOP. Is it documented somewhere, so I can share it across our team, please?

    And yes, my tally works MUCH faster than the loop.

    Just to make the minor correction, I was the one to talk about the single iteration.  Although I've not done an extensive search for it, I don't believe there's any Microsoft Documentation on that subject.

    @Grant Fritchey has written several books on the subject.  He might know where MS documented it (if it is documented) but it doesn't seem like something that MS would document because I don't believe they actually know either.

    As a bit of a sidebar, does anyone know where the documentation is for "Query Cost (relative to the batch)" documentation for execution plans is?  I've done a couple of quick searches and can't find even that.

     

    Thank you, Jeff.

     

    I could only find his reply to you 🙂

    https://www.sqlservercentral.com/forums/topic/query-cost-relative-to-the-batch-vs-statistics-io-vs-extended-events

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply