Performance issue with tally solution

  • See what I mean? This works...

    [font="Courier New"] SELECT [Name],LenCount,LenTally,REPLACE(definition,CHAR(13)+CHAR(10),CHAR(1))

       FROM (

             SELECT TOP(200) o.name, m.definition,

                    (DATALENGTH(m.Definition)-DATALENGTH(REPLACE(m.Definition,(CHAR(13)+CHAR(10)),'')))/4 AS LenCount,

                    (

                     SELECT COUNT(*) AS DoDah 

                       FROM Tally 

                      WHERE N <= LEN(definition)

                        AND SUBSTRING(definition,N,2) = CHAR(13)+CHAR(10)

                    )AS LenTally

               FROM MASTER.sys.all_objects o

               JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID

              WHERE TYPE = 'P'

            )d

      WHERE LenCount  LenTally[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Actually, it's a "white space" problem.

    [font="Courier New"]DECLARE @CrLf NVARCHAR(MAX)

        SET @CrLf = 'X    '+CHAR(13)+CHAR(10)

     SELECT LEN(@CrLf),

            LEN(REPLACE(@CrLf,CHAR(13)+CHAR(10),''))[/font]

    Any sproc that has spaces just before the last CrLf will have the problem because when the trailing CrLf goes away, those spaces become trailing spaces that LEN will not find.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, you're absolutely right. Oh dear, I should have realised that, but I stupidly assumed that the two would be the same.

    This proves the point

    [font="Courier New"]SELECT [name] FROM

       (

       SELECT TOP(200) o.name, m.definition

       FROM MASTER.sys.all_objects o

          JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID

       WHERE TYPE = 'P'

       ) f

    WHERE --get the count using the Replace trick

       (LEN(REPLACE(definition,' ','|'))- LEN(REPLACE(REPLACE(definition,' ','|'),'

    ','')))/2

    <>     --use a tally table

       (SELECT COUNT(*) FROM (SELECT SUBSTRING(definition,number,2)AS 'ch' FROM numbers WHERE number<=LEN(definition)) f

    WHERE ch='

    ')

    [/font]

    Which produces the correct result. We can then use a much neater way of counting the no. of lines.

    [font="Courier New"]SELECT [name] FROM

       (

       SELECT TOP(200) o.name, m.definition

       FROM MASTER.sys.all_objects o

          JOIN MASTER.sys.all_sql_modules m ON o.OBJECT_ID = m.OBJECT_ID

       WHERE TYPE = 'P'

       ) f

    WHERE --get the count using the Replace trick

       LEN(definition)- LEN(REPLACE(definition, '

    ','|'))

    <>     --use a tally table

       (SELECT COUNT(*) FROM (SELECT SUBSTRING(definition,number,2)AS 'ch' FROM numbers WHERE number<=LEN(definition)) f

    WHERE ch='

    ')

    [/font]

    Best wishes,
    Phil Factor

  • This is the solution to the original problem, based on the Update method. It actually uses a tally table too. Because it has to do a lot of string lifting, it is slower than the other two methods, but I thought I'd put it in just for the sake of completeness. I feel sure that, with a bit of optimisation, it could be as fast as the pure tally, but I hit problems with the UPDATE ... FROM query, whist using variables. SQL Server Bugs, I'm afraid. I backed off.

    [font="Courier New"]DECLARE @Delimiter VARCHAR(10)

    SELECT @Delimiter='

    '

    DECLARE  @insertTable TABLE

       (

       insertTable_id INT IDENTITY(1,1) PRIMARY KEY,

       [name] NVARCHAR(128),

       [start] INT DEFAULT 0,

       [end] INT DEFAULT 0,

       TheSubString VARCHAR(MAX) DEFAULT '',

       Definition VARCHAR(MAX)

       )

    INSERT INTO @insertTable (name,definition)

    SELECT name,definition FROM @source

       INNER JOIN @tally ON n<=

       (LEN(definition)

               -LEN(REPLACE(definition,@Delimiter,'|')))/(LEN(@Delimiter)-1)+1

    DECLARE @Start INT, @End INT,@LenDelimiter INT

    DECLARE @previousname VARCHAR(80),@dummy VARCHAR(MAX)

    SELECT @previousname='', @LenDelimiter=LEN(@Delimiter)

    UPDATE @insertTable

       SET

       @start=start=CASE WHEN [name]=@previousname THEN @end+@LenDelimiter ELSE 0 END,

       @end=[end]=CHARINDEX(@delimiter,definition+@delimiter,@start),

       @PreviousName=[name],

       TheSubString=SUBSTRING(definition,@start,@end-start) -- Results

    [/font]

    Best wishes,
    Phil Factor

  • Phil Factor (5/2/2009)


    but I hit problems with the UPDATE ... FROM query, whist using variables. SQL Server Bugs, I'm afraid.

    Which bugs?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • They were rather interesting. I was when doing a 'quirky update' from a join with a temporary table and a table variable. It worked fine on the simple test harness with a small amount of data but failed when using a larger table. I probably still have the code. It was very odd. I sometimes come across this with the Update with variables. Sometimes, the addition of an empty string works wonders. Don't ask why! This time, I couldn't get it to work reliably (it worked fine with up to fifty stored procedures. Then it refused to update the table at all on 51 or more stored procedures.)

    Best wishes,
    Phil Factor

  • Phil Factor (5/2/2009)


    They were rather interesting. I was when doing a 'quirky update' from a join with a temporary table and a table variable. It worked fine on the simple test harness with a small amount of data but failed when using a larger table. I probably still have the code. It was very odd. I sometimes come across this with the Update with variables. Sometimes, the addition of an empty string works wonders. Don't ask why! This time, I couldn't get it to work reliably (it worked fine with up to fifty stored procedures. Then it refused to update the table at all on 51 or more stored procedures.)

    I can see where the joins on a quirky update may cause a problem with the quirky update... but what does that have to with the code you posted?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/1/2009)


    Flo actually used a permutation of Itzik's method. It will quite handily build a billion row table without driving the log file nuts like most every other method will.

    Hey Jeff,

    Thanks for the test rig - I had fun playing around with it!

    I used to use the odd-looking UNION/UNION ALL version, but usually ended up using the ROWNUMBER over master tables one because it was easier.

    I didn't realize it was faster too...!

    These days, none of them get a look-in because a fixed Numbers/Tally table is even more convenient.

    Back to the test rig though:

    It really is a shame about the compilation time on the pure UNION ALL method, because firing all the numbers out from a single constant scan operator would be pretty much as fast as it gets, I would think. The best I could manage (still with a shocking compilation time of over a minute!) follows.

    I was disappointed that USE PLAN doesn't work on constant scans. 🙁

    I guess KEEPFIXED PLAN would be the way - though it doesn't help in the test rig because you free the proc cache!

    Anyway, here's my effort:

    [font="Courier New"]DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    PRINT '===== Pure UNION ALL Method'

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ; WITHN0 (N) AS (SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1) UNION ALL SELECT CONVERT(BIT,1)),-- 4 rows

    N1 (N) AS (SELECT N FROM N0 UNION ALL SELECT N FROM N0),-- 8

    N2 (N) AS (SELECT N FROM N1 UNION ALL SELECT N FROM N1),-- 16

    N3 (N) AS (SELECT N FROM N2 UNION ALL SELECT N FROM N2),-- 32

    N4 (N) AS (SELECT N FROM N3 UNION ALL SELECT N FROM N3),-- 64

    N5 (N) AS (SELECT N FROM N4 UNION ALL SELECT N FROM N4),-- 128

    N6 (N) AS (SELECT N FROM N5 UNION ALL SELECT N FROM N5),-- 256

    N7 (N) AS (SELECT N FROM N6 UNION ALL SELECT N FROM N6),-- 512

    N8 (N) AS (SELECT N FROM N7 UNION ALL SELECT N FROM N7),-- 1024

    N9 (N) AS (SELECT N FROM N8 UNION ALL SELECT N FROM N8),-- 2048

    NA (N) AS (SELECT N FROM N9 UNION ALL SELECT N FROM N9),-- 4096

    NB (N) AS (SELECT N FROM NA UNION ALL SELECT N FROM NA),-- 8192

    Numbers (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM NB),

    T1 (N) AS (SELECT TOP (9999999999) N FROM Numbers ORDER BY N % 1),

    T2 (N) AS (SELECT N FROM Numbers)

    SELECTTOP (@TestSize) @Bitbucket = T1.N

    FROMT1,T1 AS T2 -- 67,108,864

    OPTION (MAXDOP 1);

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF[/font]

    ...and these are the results on my machine:

    [font="Courier New"]===== Matt Miller's Method =====

    CPU time = 453 ms, elapsed time = 456 ms.

    ===== Itzek's Method =====

    CPU time = 453 ms, elapsed time = 539 ms.

    ===== Jeff Moden's Method

    CPU time = 313 ms, elapsed time = 546 ms.

    ===== RBarryYoung's Method

    SQL Server Execution Times:

    CPU time = 344 ms, elapsed time = 455 ms.

    ===== Combined Method

    CPU time = 313 ms, elapsed time = 427 ms.

    ===== Pure UNION ALL Method

    CPU time = 157 ms, elapsed time = 178 ms.[/font]

    ...as I say, it's a shame about the compilation time.

    Paul

  • Jeff Moden (5/1/2009)


    Actually, it's a "white space" problem.

    [font="Courier New"]DECLARE @CrLf NVARCHAR(MAX)

        SET @CrLf = 'X    '+CHAR(13)+CHAR(10)

     SELECT LEN(@CrLf),

            LEN(REPLACE(@CrLf,CHAR(13)+CHAR(10),''))[/font]

    Any sproc that has spaces just before the last CrLf will have the problem because when the trailing CrLf goes away, those spaces become trailing spaces that LEN will not find.

    Jeff,

    Well spotted! I must admit I looked at it and gave up after a few minutes...

    I guess replacing LEN with DATALENGTH might fix it up too?

    Paul

  • Paul White (5/2/2009)


    Well spotted! I must admit I looked at it and gave up after a few minutes...

    I guess replacing LEN with DATALENGTH might fix it up too?

    Paul

    Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/2/2009)


    Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.

    Yes, so you did - sorry!

    This is what I get for replying in post order as I catch up on this thread 🙁

    I thought the 2 in Phil's code was for the unicode thing - but it's not is it?

    There are 2 characters!! Silly me 😉

  • Paul White (5/2/2009)


    Jeff Moden (5/2/2009)


    Yes... but then you must also account for the run being against an NVARCHAR and divide by 4 instead of just 2 in Phil's example... as I did in the working code I posted earlier to solve the problem.

    Yes, so you did - sorry!

    This is what I get for replying in post order as I catch up on this thread 🙁

    I thought the 2 in Phil's code was for the unicode thing - but it's not is it?

    There are 2 characters!! Silly me 😉

    Heh... no worries, Paul. I've done the same thing especially on threads with a bazillion posts. Just too busy to go back a catch up to try to put all of the pieces back together.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (5/2/2009)


    ...as I say, it's a shame about the compilation time.

    Heh... yeah, I agree... for a million rows, it's a shame. For more common numbers like 8000, all the other methods beat the UNION ALL method for CPU usage even after the DBCC commands are removed and the scripts are allowed to cash. At least on my box they do. On faster boxes, like yours, you would probably see zero's across the board for cpu time. After that, you'd have to consider I/O and whether it was cached in memory or not and whether or not it was for reuse by GUI code or not.

    As a side bar and for the record (I think it was in the Tally table article I wrote), I did state that some of these CTE methods would actually beat the all too convenient Tally table method. The cursor zealots would love to hear me say that at the speeds that all of these things run at (including the Tally table), that you should go for whatever gives the greatest ease of programming and readability. While there's certainly something to be said about that, I'll still maintain the position that knocking 150 milliseconds off a 450 millisecond run and maybe knocking I/O to 0 is a good thing to go after even if the code becomes a bit more complex because you just can't tell where something is going to be used in the future.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    For common numbers like 8000 it's even sillier - the version I posted was a hack-around to get past the fact that the original fails to compile at all over 16384 rows with a message stating that 'there is insufficient system memory to run this query' - like, wow! That's just an estimated execution plan too!

    I haven't met a cursor zealot for a while now - which is sad because I would love to have the debate with one. Anyone who thinks that cursor solutions are somehow more natural or easier to understand than the set-based tally has spent too long with procedural code ;c)

    BTW my laptop is not a very speed machine - Pentium 4m (one 'core'!) @2GHz! It does have 2GB RAM though which is ok.

    I have to start SQL Server with the -P hack just to simulate parallel plans though...

    Anyway, here's the memory-busting version, seeing as I referred to it:

    [font="Courier New"]; WITHN0 (N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),-- 4 rows

    N1 (N) AS (SELECT N FROM N0 UNION ALL SELECT N FROM N0),-- 8

    N2 (N) AS (SELECT N FROM N1 UNION ALL SELECT N FROM N1),-- 16

    N3 (N) AS (SELECT N FROM N2 UNION ALL SELECT N FROM N2),-- 32

    N4 (N) AS (SELECT N FROM N3 UNION ALL SELECT N FROM N3),-- 64

    N5 (N) AS (SELECT N FROM N4 UNION ALL SELECT N FROM N4),-- 128

    N6 (N) AS (SELECT N FROM N5 UNION ALL SELECT N FROM N5),-- 256

    N7 (N) AS (SELECT N FROM N6 UNION ALL SELECT N FROM N6),-- 512

    N8 (N) AS (SELECT N FROM N7 UNION ALL SELECT N FROM N7),-- 1024

    N9 (N) AS (SELECT N FROM N8 UNION ALL SELECT N FROM N8),-- 2048

    NA (N) AS (SELECT N FROM N9 UNION ALL SELECT N FROM N9),-- 4096

    NB (N) AS (SELECT N FROM NA UNION ALL SELECT N FROM NA),-- 8192

    NC (N) AS (SELECT N FROM NB UNION ALL SELECT N FROM NB),-- 16384

    Numbers (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM NC)

    SELECTTOP (@TestSize)

    @Bitbucket = N

    FROMNC

    OPTION(KEEPFIXED PLAN, MAXDOP 1);

    [/font]

    Just how much dev effort is involved in fixing the colour tags anyway?!

    Cheers,

    Paul

  • Hi gents,

    I have been following the discussion, half and from a large distance, so I ain't really in the loop...so to speak ;).

    I was wondering how this tally function that I been using for a while now does stack up against the other methods being tested here. If it is shockingly different (slower) I sure will spend some time to see why and then improve upon it from there.

    -- Tally table function (24 bit, large enaugh for general purpose use)

    --

    create function dbo.tfnTally24bit( @max-2 int ) returns table

    as

    return

    (

    with

    nQ( N ) as

    (

    select 0 union all select 0 union all select 0 union all select 0 union all -- 4

    select 0 union all select 0 union all select 0 union all select 0 union all -- 8

    select 0 union all select 0 union all select 0 union all select 0 union all -- 12

    select 0 union all select 0 union all select 0 union all select 0 -- 16

    )

    select top ( isnull( @max-2, 0 ) )

    row_number() over ( order by anchor.constant ) as n

    from

    ( select 0 as constant ) as anchor

    cross join nQ n1 -- 16 ( 4 bit)

    cross join nQ n2 -- 256 ( 8 bit)

    cross join nQ n3 -- 4096 (12 bit)

    cross join nQ n4 -- 65536 (16 bit)

    cross join nQ n5 -- 1048576 (20 bit)

    cross join nQ n6 -- 16777216 (24 bit)

    )

    ;

    go

    -- Test the inline "dbo.tfnTally24bit" function

    --

    select * from dbo.tfnTally24bit( 10 );

    go

    I saw somone convert the numbers to bits, does that have an effect on memory use when none of these constants are returned in the result?

Viewing 15 posts - 376 through 390 (of 522 total)

You must be logged in to reply to this topic. Login to reply