Are the posted questions getting worse?

  • Heh... in case anyone is wondering if I simply disappeared on this, the answer is "No". It's just that the freakin' cursor is taking me so long to write. I have to wash my mouth out every 5 minutes because I can't stand the taste. 😛

    I'll be back with the typical million row solution shortly.... hang in there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Back in a sec... virus software update kicked in. Wouldn't make for a fair test.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The following code creates a million row multiple account checkbook, does the "Fire Hose" cursor running total thing 5 times and follows that with the "Pseudo Cursor" method 5 times. After that, the graphic I captured from that code running on Profiler on my poor old 7 year old P4 1.8GHz 1GB Ram desktop computer running 2k5 DE sp2. As usual, the details are in the embedded comments in the code.

    [font="Courier New"]--===== Create the test table and the necessary indexes

    --===== Do this in a nice safe place

        USE TempDB

    --===== Create a "check book" table with a ratio of deposits

         -- to withdrawals and constraints on the amounts for both.

         -- This time, we include transaction dates and we don't 

         -- care about the order of the transactions.  We also 

         -- include different account numbers.

         -- (All Versions)

    --==============================================================================

    --===== Conditionally drop the test table 

         -- (so we can rerun the test code as needed)

         IF OBJECT_ID('TempDB..CheckBook'IS NOT NULL

            DROP TABLE CheckBook

    --===== Declare some obviously named variables

    DECLARE @StartDate DATETIME,

            @EndDate   DATETIME--Upto and NOT including this date

            @Days      INT,      --This will be the "range"

            @Counter   INT       --Used in the While Loop to make page splits

    --===== Preset the start date, end date, and calculate the range

         -- in days. Since we don't want the end date to be

         -- included, we DIDN'T add 1 to the range.

     SELECT @StartDate '01 Jan 2000',

            @EndDate   '01 Jan 2010',

            @Days      DATEDIFF(dd,@StartDate,@EndDate)

    --===== Create and populate the table on the fly.

         -- Produces approximately 10,000 rows for each of 1,000 accounts

     SELECT TOP 1000000

            IDENTITY(INT,1,1AS CheckBookID,

            ABS(CHECKSUM(NEWID()))%1000+AS AccountNumber--1000 accounts

            CAST(--= Roughly every tenth row (avg) will be a deposit

                CASE ABS(CHECKSUM(NEWID())) % 10           -- 0-9 (10 digits)

                WHEN --Occurs 1 out of ten times on average

                THEN RAND(CHECKSUM(NEWID())) * 500 1000  --Deposits

                ELSE RAND(CHECKSUM(NEWID())) * (-100)      --Withdrawals

                END

            AS DECIMAL (9,2)) AS Amount,

            ABS(CHECKSUM(NEWID())) % @Days @StartDate    --Random DateTime 

            AS TransactionDate

       INTO dbo.CheckBook

       FROM Master.dbo.SysColumns sc1

      CROSS JOIN Master.dbo.SysColumns sc2

    --===== Create the initial deposit(s) for each account.

         -- Everybody gets at least 1 and maybe more on the

         -- first day (all first day trans are deposits)

     UPDATE CheckBook

        SET Amount RAND(CHECKSUM(NEWID())) * 500 1000

       FROM CheckBook tt

      INNER JOIN

            (--==== Find the initial date for each account

             SELECT AccountNumber

                    MIN(TransactionDateAS MinTransactionDate

               FROM CheckBook 

              GROUP BY AccountNumber

            firstdate

         ON tt.AccountNumber   firstdate.AccountNumber

        AND tt.TransactionDate firstdate.MinTransactionDate

    --===== Create the PK like many folks would do it to prevent 

         -- fragging the table on inserts. Pretty tough on SELECT's, though

      ALTER TABLE dbo.CheckBook

        ADD CONSTRAINT PK_CheckBook_CheckBookID 

    PRIMARY KEY CLUSTERED (CheckBookID)

       WITH FILLFACTOR = 100  --Actually, this is the default on most systems

    --===== Add a nonclustered index to support SELECTs and (ugh!) Cursors

         -- that have an ORDER BY

     CREATE NONCLUSTERED INDEX IX_CheckBook_AccountNumber_TransactionDate_CheckBookID

         ON dbo.CheckBook (AccountNumberTransactionDateCheckBookID)

       WITH FILLFACTOR = 90

    --===== Let's see what the distribution of rows is for each account

         -- and what the total number of rows is.

         -- This will also see if the index is up to snuff.

     SELECT CASE 

                WHEN GROUPING(cb.AccountNumber0

                THEN CAST(cb.AccountNumber AS CHAR(5))

                ELSE 'Total'

            END AS AccountNumber,

            COUNT(*) AS NumberOfRows

       FROM dbo.CheckBook cb

      GROUP BY cb.AccountNumber WITH ROLLUP

      ORDER BY cb.AccountNumber

    GO

    --==============================================================================

    GO 2

    --The "Fire Hose" Cursor Running Total Solution

    --===============================================================================

    --===== Supress the auto-display of rowcounts or the cursor will be even slower.

        SET NOCOUNT ON

    --===== Create the necessary Temp table to store the result set so we can use

         -- the result set as a table if we need to.  This will be used for the

         -- cursor run.  The Psuedo-Cursor run will build one on the fly.

         -- The condition drop is just for reruns.  It wouldn't be needed 

         -- in production but it sure helps troubleshooting.

         IF OBJECT_ID('TempDB..#cWork','U'IS NOT NULL

            DROP TABLE #cWork

    --===== Create the cursor result set table using the "zero row" trick

     SELECT CAST(CheckBookID AS INTAS CheckBookID--Strips of IDENTITY

            AccountNumber,

            Amount,

            TransactionDate,

            CAST(AS DECIMAL(9,2)) AS AccountBalance

       INTO #cWork

       FROM dbo.CheckBook

      WHERE --We just want table structure, no rows

    --===== Declare local variables for the cursor run

    DECLARE @CheckBookID     INT,

            @AccountNumber   INT,

            @Amount          DECIMAL(9,2),

            @TransactionDate DATETIME,

            @CurBalance      DECIMAL(9,2),

            @PrevBalance     DECIMAL(9,2), 

            @PrevAccount     INT

    --===== Preset the necessary variables

     SELECT @CurBalance  0,

            @PrevBalance 0,

            @PrevAccount 0

    --===== Declare and open a cursor

         --Haven't confirmed it, but rumor has it that FAST_FORWARD

         --isn't quite as fast as ORWARD_ONLY READ_ONLY 

    DECLARE curRunningTotal CURSOR LOCAL FORWARD_ONLY READ_ONLY 

        FOR SELECT CheckBookID,

                   AccountNumber,

                   Amount,

                   TransactionDate

       FROM dbo.CheckBook WITH(NOLOCK--Just to help speed on this test

      ORDER BY AccountNumberTransactionDateCheckBookID

    --===== Open the Cursor and Fetch the first row (Lordy I hate saying that!! ;-))

       OPEN curRunningTotal

      FETCH NEXT FROM curRunningTotal 

       INTO @CheckBookID,

            @AccountNumber,

            @Amount,

            @TransactionDate

    --===== Now, loop through the cursor and insert the rows into

         -- the work table including calculations for the 

         -- running total

      WHILE @@FETCH_STATUS 0

      BEGIN

            --===== Do the necessary math for balances that restart

                 -- when the AccountNumber changes.  The also preps

                 -- the @Prev* variables for the next record.

             SELECT @CurBalance 

                    CASE 

                        WHEN @AccountNumber @PrevAccount

                        THEN @PrevBalance @Amount

                        ELSE @Amount

                    END,

                    @PrevBalance @CurBalance,

                    @PrevAccount @AccountNumber

            --===== Insert those calculations as a new row in the result set

             INSERT INTO #cWork

                    (

                    CheckBookID,

                    AccountNumber,

                    Amount,

                    TransactionDate,

                    AccountBalance

                    )

             SELECT @CheckBookID     AS CheckBookID,

                    @AccountNumber   AS AccountNumber,

                    @Amount          AS Amount,

                    @TransactionDate AS TransactionDate,

                    @CurBalance      AS AccountBalance

            --===== Get the next record (I really hate this ;-))

              FETCH NEXT FROM curRunningTotal 

               INTO @CheckBookID,

                    @AccountNumber,

                    @Amount,

                    @TransactionDate

       END  --Loop back for the next record.

    --======== Clean up the mess from the cursor

         CLOSE curRunningTotal

    DEALLOCATE curRunningTotal

    --===== Add a clustered index to the #cWork table to support SELECT's

     CREATE CLUSTERED INDEX IX_#cWork_AccountNumber_TransactionDate_CheckBookID

         ON #cWork (AccountNumberTransactionDateCheckBookID)

       WITH FILLFACTOR = 100 

    --===== Select the first 2 accounts just to wake us up that it actually finished

         -- and so it's easier to see that it actually worked when the account 

         -- number changed.

     SELECT *

       FROM #cWork

      WHERE AccountNumber <= 2

      ORDER BY AccountNumberTransactionDate

    GO 5

    --==============================================================================

    GO 2

    --The "Pseudo-Cursor" Running Total Solution ("Quirky" Update as Phil calls it)

    --==============================================================================

    --===== Turn the auto-display of rowcounts back on just to give the cursor a 

         -- chance. ;-)

        SET NOCOUNT OFF

    --===== Create the necessary Temp table to store the result set so we can use

         -- the result set as a table if we need to.  The Psuedo-Cursor run will 

         -- build and populate it on the fly.

         -- The condition drop is just for reruns.  It wouldn't be needed 

         -- in production but it sure helps troubleshooting.

         IF OBJECT_ID('TempDB..#pcWork','U'IS NOT NULL

            DROP TABLE #pcWork

    --===== Create the cursor result set table using the "zero row" trick

     SELECT CAST(CheckBookID AS INTAS CheckBookID--Strips off IDENTITY

            AccountNumber,

            Amount,

            TransactionDate,

            CAST(AS DECIMAL(9,2)) AS AccountBalance --Creates new column

       INTO #pcWork

       FROM dbo.CheckBook

    --===== Add a clustered index to the #psWork table to support SELECT's.

         -- This is also the key to the Pseudo-Cursor or "quirky" update

         -- method for doing running totals.

     CREATE CLUSTERED INDEX IX_#pcWork_AccountNumber_TransactionDate_CheckBookID

         ON #pcWork (AccountNumberTransactionDateCheckBookID)

       WITH FILLFACTOR = 100 

    --===== Declare local variables for the cursor run

    DECLARE @PrevBalance     DECIMAL(9,2), 

            @PrevAccount     INT

    --===== Preset the necessary variables

     SELECT @PrevBalance 0,

            @PrevAccount 0

    --===== Let's rock.  Do the running balance using a single

         -- update as a Pseudo-Cursor.

     UPDATE #pcWork

        SET @PrevBalance AccountBalance CASE

                                                WHEN AccountNumber @PrevAccount

                                                THEN @PrevBalance Amount

                                                ELSE Amount

                                            END,

            @PrevAccount AccountNumber

       FROM #pcWork WITH(INDEX(0),TABLOCKX--Warm fuzzy options

     OPTION (MAXDOP 1--Just to be absolutely safe

    --===== Select the first 2 accounts to do the same as the cursor

         -- and so it's easier to see that it actually worked when the account 

         -- number changed.

     SELECT *

       FROM #pcWork

      WHERE AccountNumber <= 2

      ORDER BY AccountNumberTransactionDate

    GO 5[/font]

    Here's the performance stats from the profiler run I setup to look at the single spid this all ran on...

    ... and, here's the code that proves they both came up with the same running balances.

    [font="Courier New"] SELECT c.*,pc.*

       FROM #cWork c

      INNER JOIN #pcWork pc

         ON c.CheckBookID     pc.CheckBookID

        AND c.AccountBalance <> pc.AccountBalance

      ORDER BY pc.AccountNumberpc.TransactionDate[/font]

    Like Adam said, I do use these for speed. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey, Jeff is back! 😀 Great stuff Jeff.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Lynn Pettis (5/28/2009)


    Okay! Any one in Germany near Frankfurt? Maybe has a teen age (around 18 year old) son or daughter (preferably)?

    And yes I am serious. I have a teenage daughter who paid for her own trip to Germany and is currently not having a very good time like she had hoped.

    Lynn, if you can get her to London - flights from Frankfurt to London City used to be quite cheap - I'll send my 18-year old daughter down from Manchester. My flat in Leyton is empty at the mo, they could share and explore London together.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hey Jeff,

    Really neat demonstration. Love it. Lots of cool optimizations too.

    The ultimate demo is where the original table already happens to have a suitable clustered index (it does happen!) and an existing column for the running total needs to be populated (though adding a NULLable column is fast too). This results in a plan with one clustered index scan, a top operator (for halloween protection), a compute scalar (for the maths), and a clustered index update. It doesn't get much better than that. On my (2GHz single-core Pentium M) laptop, that scenario completes in just 4,348 logical reads and 2,391ms (with a warm cache).

    BTW: I use INDEX(1) instead of (INDEX(0)) for the in-place update to ensure I get a linked-list ordered scan (to avoid a sort) rather than the IAM scan that happens with TABLOCK or at READ UNCOMMITTED. For sure, that's not an issue for your script - I think you even optimized for the IAM scan knowing that the table was pretty much guaranteed to be in physical order. Nice one.

    Quite shocking to see a cursor authored by Mr RBAR though, even if it was for demonstration purposes. 😉

    Paul

  • Paul White (6/3/2009)


    One thing to mention is that the article is the first of a series

    Well, actually it's the second in a series.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Jeff Moden (6/3/2009)


    The following code creates a million row multiple account checkbook, does the "Fire Hose" cursor running total thing 5 times and follows that with the "Pseudo Cursor" method 5 times. After that, the graphic I captured from that code running on Profiler on my poor old 7 year old P4 1.8GHz 1GB Ram desktop computer running 2k5 DE sp2. As usual, the details are in the embedded comments in the code.

    Jeff, I have a (simple) question. Is this the million-row test you were whipping up for Adam, or your approach for Itzik's article --- or were you able to handle both of these in one?

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Paul White (6/4/2009)


    BTW: I use INDEX(1) instead of (INDEX(0)) for the in-place update to ensure I get a linked-list ordered scan (to avoid a sort) rather than the IAM scan that happens with TABLOCK or at READ UNCOMMITTED. For sure, that's not an issue for your script - I think you even optimized for the IAM scan knowing that the table was pretty much guaranteed to be in physical order. Nice one.

    Jeff's technique depends on not getting an IAM scan, not having the data fetched in any order than that of the clustering key (index order scan) and not having the optimiser use any order-mangling operations. If any of those happen, the 'quirky' update can give incorrect values.

    That's why the restrictions are no partitioning, no parallelism.

    I asked about the IAM scan a while back and apparently it will not happen on updates. Sorry, no public blog I can refer you to, was a conversation with someone on the optimiser team.

    Way to tell that it's index order, not allocation order - check the order property of the clustered index scan in the execution plan. Ordered = No is an allocation order scan

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (6/4/2009)


    Jeff's technique depends on not getting an IAM scan, not having the data fetched in any order than that of the clustering key (index order scan) and not having the optimiser use any order-mangling operations. If any of those happen, the 'quirky' update can give incorrect values.

    That's why the restrictions are no partitioning, no parallelism.

    I asked about the IAM scan a while back and apparently it will not happen on updates. Sorry, no public blog I can refer you to, was a conversation with someone on the optimiser team.

    Way to tell that it's index order, not allocation order - check the order property of the clustered index scan in the execution plan. Ordered = No is an allocation order scan

    Interesting - knew all that except the no IAM scan in an update plan. Seems odd because I get that sort in my UPDATE plan with an INDEX(0) hint, but no sort with INDEX(1).

    Perhaps someone forgot to tell the QO about that behaviour.

    I thought TABLOCK/NOLOCK = IAM scan? I wonder why Jeff's hints are as they are then?

    BTW - I don't trust the Ordered = No thing, since I read http://www.sqlmag.com/Article/ArticleID/92887/sql_server_92887.html

    There's an explicit example there of ordered:false doing a logical order scan.

    edit: pesky word 'not' crept in there

  • Paul White (6/4/2009)


    Interesting - knew all that except the no IAM scan in an update plan.

    Think it was that. I know it was a discussion on 'ordering' in updates related to exactly this query and why it seems to work. (in case you haven't figured it out, I don't like this query form and I do have my reasons).

    Can't find the conversation, can't completely recall details (was about 8 or so months ago) and even if I could find the conversation, I wouldn't be able to post it here.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff, after you install the hand rails, you'll also need one of these:

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Paul White (6/4/2009)


    Way to tell that it's index order, not allocation order - check the order property of the clustered index scan in the execution plan. Ordered = No is an allocation order scan

    Interesting - knew all that except the no IAM scan in an update plan. Seems odd because I get that sort in my UPDATE plan with an INDEX(0) hint, but no sort with INDEX(1).

    Perhaps someone forgot to tell the QO about that behaviour.

    I thought TABLOCK/NOLOCK = IAM scan? I wonder why Jeff's hints are as they are then?

    BTW - I don't trust the Ordered = No thing, since I read http://www.sqlmag.com/Article/ArticleID/92887/sql_server_92887.html

    There's an explicit example there of ordered:false doing a logical order scan.

    edit: pesky word 'not' crept in there

    Here's the reason why I use Index(0) instead of Index(1). From BOL:

    If a clustered index exists, INDEX(0) forces a clustered index scan and INDEX(1) forces a clustered index scan or seek.

    It's overkill, but it makes people feel better about using the method. Just don't try it with any non-clustered index because they can go "merry-go-round" on you... heh... well, almost never. I'm still doing some experiments on that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey (6/4/2009)


    Jeff, after you install the hand rails, you'll also need one of these:

    BWAA-HAAA!!! Thanks for the help. Looks like I'm gonna need the hand rails after all, huh? 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, has your avatar something in common with this?

    http://www.youtube.com/watch?v=vF4iWIE77Ts

    It's incredible I didn't notice it before!!

    -- Gianluca Sartori

Viewing 15 posts - 5,311 through 5,325 (of 66,738 total)

You must be logged in to reply to this topic. Login to reply