Solving the Running Total and Ordinal Rank Problems (Rewritten)

  • Here's the non-Quirky Update code to do what you want. For those that think a CTE would be better, please read the comments in the code to find out why you might want to reconsider...

    --=====================================================================================================================

    -- Setup the demonstration. Nothing in this section is a part of the solution. It's just to create test data.

    --=====================================================================================================================

    --===== Do this demonstration in a nice safe place that everyone has.

    USE tempdb;

    --===== Conditionally drop all the test tables to make reruns easier

    IF OBJECT_ID('tempdb.dbo.tmp_conf_server' ,'U') IS NOT NULL DROP TABLE tempdb.dbo.tmp_conf_server;

    --===== Create a simulation of a real table in tempdb

    CREATE TABLE dbo.tmp_conf_server

    (

    RecordedDateTime DATETIME NOT NULL,

    ServerName VARCHAR(260) NOT NULL,

    LastRestarted DATETIME NOT NULL,

    TotalBatchRequests BIGINT NOT NULL,

    TotalDeadlocks BIGINT NOT NULL,

    TotalCompilations BIGINT NOT NULL,

    TotalReCompilations BIGINT NOT NULL,

    CONSTRAINT PK_cls_RecordedDateTime_Servername__tmp_conf_server

    PRIMARY KEY CLUSTERED (RecordedDateTime ASC, ServerName ASC)

    );

    --===== Populate the simulated real table with test data

    INSERT INTO tempdb.dbo.tmp_conf_server

    (RecordedDateTime, ServerName, LastRestarted, TotalBatchRequests, TotalDeadlocks, TotalCompilations, TotalReCompilations)

    SELECT '2011-04-05 16:46:46.630','server1','2011-02-25 13:29:00.000',500,12,3050,5555 UNION ALL

    SELECT '2011-04-05 17:49:22.350','server1','2011-02-25 13:29:00.000',750,12,4000,6666 UNION ALL

    SELECT '2011-04-06 17:34:22.473','server1','2011-02-25 13:29:00.000',810,12,4075,7777 UNION ALL

    SELECT '2011-04-05 16:46:47.630','server2','2011-02-27 11:29:00.000',888,5 ,30 ,3333 UNION ALL

    SELECT '2011-04-05 17:49:23.350','server2','2011-02-27 11:29:00.000',891,6 ,40 ,5900 UNION ALL

    SELECT '2011-04-06 17:34:23.473','server2','2011-02-27 11:29:00.000',950,8 ,7000,5950;

    --=====================================================================================================================

    -- Solve the problem of getting a "delta" between the current and previous rows for the "INC" columns

    -- in a simplified manner and without a "Quirky Update".

    --=====================================================================================================================

    --===== Conditionally drop all the test tables to make reruns easier

    IF OBJECT_ID('tempdb..#tmp_conf_server_Inc','U') IS NOT NULL DROP TABLE #tmp_conf_server_Inc;

    --===== Create and populate the table we'll use to calculate the increments with.

    -- Notice how ISNULL is used to make a NOT NULL column on the fly.

    SELECT RowNum = ISNULL(ROW_NUMBER() OVER (ORDER BY ServerName, RecordedDateTime),0),

    ServerName,

    RecordedDateTime,

    LastRestarted,

    TotalBatchRequests,

    TotalDeadlocks,

    TotalCompilations,

    TotalReCompilations,

    IncSeconds = CAST(0 AS BIGINT),

    IncBatchRequests = CAST(0 AS BIGINT),

    IncDeadlocks = CAST(0 AS BIGINT),

    IncCompilations = CAST(0 AS BIGINT),

    IncReCompilations = CAST(0 AS BIGINT)

    INTO #tmp_conf_server_Inc

    FROM dbo.tmp_conf_server;

    --===== Add a clustered index (as a PKK in this case) NOT to do a "Quirky Update" but to simply make the code run faster.

    -- As a sidebar, NEVER name a constraint on a #Temp table or it will destroy the ability to do concurrent runs.

    ALTER TABLE #tmp_conf_server_Inc

    ADD PRIMARY KEY CLUSTERED (RowNum);

    --===== You could certainly do this without a Temp Table using a CTE, but it would take twice as long because it would

    -- have to execute the CTE twice.

    SELECT hi.ServerName,

    hi.RecordedDateTime,

    hi.LastRestarted,

    hi.TotalBatchRequests,

    hi.TotalDeadlocks,

    hi.TotalCompilations,

    hi.TotalReCompilations,

    IncSeconds = CASE WHEN hi.ServerName = lo.ServerName THEN DATEDIFF(ss, lo.RecordedDateTime , hi.RecordedDateTime) ELSE NULL END,

    IncBatchRequests = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalBatchRequests - lo.TotalBatchRequests ELSE NULL END,

    IncDeadlocks = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalDeadlocks - lo.TotalDeadlocks ELSE NULL END,

    IncCompilations = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalCompilations - lo.TotalCompilations ELSE NULL END,

    IncReCompilations = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalReCompilations - lo.TotalReCompilations ELSE NULL END

    FROM #tmp_conf_server_Inc lo

    RIGHT OUTER JOIN #tmp_conf_server_Inc hi ON lo.RowNum+1 = hi.RowNum;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/8/2011)


    Also, be advised. You didn't add the required clustered index to the #Temp table and that may be why your 2 part code didn't work the way you expected. Still and like I said before, a "Quirky Update" isn't need for the simple chore of diffing two adjacent rows. I'll have some code for you for that soon.

    Nadrek,

    In case you're wondering, and if I'm not mistaken, the clustered index fields are in the wrong order.

  • They in the incorrect order only on the source table. That's why he copies the rows to another table where he can apply a different clustered index.

    Still, you just don't need a QU for this exercise.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No, I'm not trolling; I'm advertising your article! The full context was:

    -- Quirky update!!! BEWARE!!! Do not use this for critical things!

    -- BEFORE USING, read http://www.sqlservercentral.com/articles/T-SQL/68467/

    Jeff Moden (4/8/2011)


    Also, be advised. You didn't add the required clustered index to the #Temp table and that may be why your 2 part code didn't work the way you expected. Still and like I said before, a "Quirky Update" isn't need for the simple chore of diffing two adjacent rows. I'll have some code for you for that soon.

    What is the problem with creating the clustered index as the primary key in the CREATE TABLE #tmp_conf_server_Inc statement itself, which was:

    CONSTRAINT [PK_cls_RecordedDateTime_Servername__#tmp_conf_server_Inc] PRIMARY KEY CLUSTERED

    (

    [Anchor],

    [ServerName] ASC,

    [RecordedDateTime] ASC

    )WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90, DATA_COMPRESSION = PAGE) ON [PRIMARY]

    Was there another problem with my two-part set statement variant?

    I see the Rownum based self-join; I'll take a look at how that performs with several hundred thousand rows when I get a chance.

  • Nadrek (4/11/2011)


    No, I'm not trolling; I'm advertising your article! The full context was:

    -- Quirky update!!! BEWARE!!! Do not use this for critical things!

    -- BEFORE USING, read http://www.sqlservercentral.com/articles/T-SQL/68467/

    "Do not use this for critical things" is an "advertisement"? Not where I come from. 😉

    And, yes, I use it for "critical" things.

    Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (4/12/2011)


    "Do not use this for critical things" is an "advertisement"? Not where I come from. 😉

    And, yes, I use it for "critical" things.

    Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.

    Well, the link to the article is an advertisement :). The "do not use this for critical things" is much easier to say than "you have to follow all the rules; if you fail even one rule, you may, or may not, receive invalid results"; and I tend towards limited use of undocumented features for critical work; I'll use them more often in noncritical work.

  • Thanks to all the contributors to this article and its comments. To help some purists understand how Jeff's article is adding value: I work in a management consulting firm where we use SQL as a more powerful version of Excel - very far from production, with custom-coded analyses we use for just a few months in a specific server environment, with the analyses carefully sanity checked by humans before implementing anything, and we just have to be right on average and not 100% of the time. A decade ago, one of us discovered quirky update (probably from one of the poorly written articles back then), and it has enabled us to run sophisticated analysis on multi-billion row tables to drive hundreds of millions of dollars of measurable value for our clients. Without quirky update, we would often not be able to deliver our answers quickly enough to drive this value. (Although I'm excited to try Paul's SQLCLR solution now, especially to see if it works or can be adapted to work for partitioned tables.)

    Over the years, we discovered many of these rules (most importantly, the row-increment error checking that I think Paul and Tom posted here), but it has been very helpful to have them all in one place! For the last year we have been referring our new analysts to this article as a helpful checklist to accompany the careful sanity checking and sample testing they always do when writing new code, and we have been so grateful to have it.

    For the forthcoming rewrite, it might be helpful to note (in an appendix?) that if you set the value of the same variable multiple times, it is not predictable in which order these statements will be processed. I know that should be the default assumption, and it has been pointed out in some of the earlier posts in this conversation. Still, it's something we've found worth including in our internal training documents because it often trips up newcomers to the technique and costs them some debugging time to learn it the hard way.

  • Nadrek (4/12/2011)


    Jeff Moden (4/12/2011)


    "Do not use this for critical things" is an "advertisement"? Not where I come from. 😉

    And, yes, I use it for "critical" things.

    Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.

    Well, the link to the article is an advertisement :). The "do not use this for critical things" is much easier to say than "you have to follow all the rules; if you fail even one rule, you may, or may not, receive invalid results"; and I tend towards limited use of undocumented features for critical work; I'll use them more often in noncritical work.

    Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ethan.murray (5/11/2011)


    Thanks to all the contributors to this article and its comments. To help some purists understand how Jeff's article is adding value: I work in a management consulting firm where we use SQL as a more powerful version of Excel - very far from production, with custom-coded analyses we use for just a few months in a specific server environment, with the analyses carefully sanity checked by humans before implementing anything, and we just have to be right on average and not 100% of the time. A decade ago, one of us discovered quirky update (probably from one of the poorly written articles back then), and it has enabled us to run sophisticated analysis on multi-billion row tables to drive hundreds of millions of dollars of measurable value for our clients. Without quirky update, we would often not be able to deliver our answers quickly enough to drive this value. (Although I'm excited to try Paul's SQLCLR solution now, especially to see if it works or can be adapted to work for partitioned tables.)

    Over the years, we discovered many of these rules (most importantly, the row-increment error checking that I think Paul and Tom posted here), but it has been very helpful to have them all in one place! For the last year we have been referring our new analysts to this article as a helpful checklist to accompany the careful sanity checking and sample testing they always do when writing new code, and we have been so grateful to have it.

    For the forthcoming rewrite, it might be helpful to note (in an appendix?) that if you set the value of the same variable multiple times, it is not predictable in which order these statements will be processed. I know that should be the default assumption, and it has been pointed out in some of the earlier posts in this conversation. Still, it's something we've found worth including in our internal training documents because it often trips up newcomers to the technique and costs them some debugging time to learn it the hard way.

    Ethan... thank you very much for taking the time to write that up. I appreciate it very much.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/11/2011)


    Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.

    My apologies; I did not mean any offense. I've edited my code and post with the more precise version.

    That said, while I do not need to use the quirky update, did anyone ever notice what mistake I made on the 2 part version, so I can avoid making it again?

  • Nadrek (5/12/2011)


    Jeff Moden (5/11/2011)


    Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.

    My apologies; I did not mean any offense. I've edited my code and post with the more precise version.

    That said, while I do not need to use the quirky update, did anyone ever notice what mistake I made on the 2 part version, so I can avoid making it again?

    Thank you for your consideration.

    I'll also go back and try to figure out your question. Thanks, Nadrek.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • FYI:

    I just compared this Quirky Update code to the newest enhancements in the next version of SQL Server (code named "Denali") CTP3 and found that the Quirky Update performs far greater than the new enhancements in Denali CTP3 currently do. (The Quirky Update was not modified to perform the safety check.)

    Denali CTP3 code:

    -- DENALI CTP3 Running Totals with enhanced over clause.

    ;

    WITH cte AS

    (

    SELECT TransactionDetailID,

    [Date],

    AccountID,

    Amount,

    AccountRunningTotal = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),

    AccountRunningCount = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),

    NCID

    FROM dbo.TransactionDetail

    )

    UPDATE td

    SET AccountRunningTotal = cte.AccountRunningTotal,

    AccountRunningCount = cte.AccountRunningCount

    FROM dbo.TransactionDetail td

    JOIN cte

    ON cte.TransactionDetailID = td.TransactionDetailID

    GO

    --===== Verify the the running total worked on the

    -- TransactionDetail table

    USE TEMPDB --Takes 10 seconds on my machine

    EXEC dbo.Verify

    GO

    Results (comparing Quirky Update to new features):

    (Not sure why the Quirky Update is showing zero writes - possible bug in Profiler with CTP3?)

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • It's not going to matter... even though I'm seriously disappointed in the performance (as you've measured), the new SUM() is faster than a While Loop, easier to write, does not require a table update, and it's "officially" supported.

    I normally don't go near the bleeding edge but I guess I'm going to have to load up Denalli CTP3 and do some testing on my own.

    Thanks for the early feedback, Wayne. I really appreciate the time you put into this one.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wayne, you can avoid the join:

    UPDATE target

    SET AccountRunningTotal = art,

    AccountRunningCount = arc

    FROM

    (

    SELECT

    td.AccountRunningTotal,

    td.AccountRunningCount,

    art = SUM(td.Amount) OVER (PARTITION BY td.AccountID ORDER BY td.Date, td.TransactionDetailID),

    arc = COUNT_BIG(*) OVER (PARTITION BY td.AccountID ORDER BY td.Date, td.TransactionDetailID)

    FROM dbo.TransactionDetail AS td

    ) AS target

  • That should run quite a bit faster.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 241 through 255 (of 307 total)

You must be logged in to reply to this topic. Login to reply