Solving the “Running Totals” Problem in SQL Server 2005 with Partitioned Tables

  • David Jackson (1/27/2009)


    Lynn Pettis (1/27/2009)


    Smendle (1/27/2009)


    That can happen on multi-processor systems that SQL is allowed to use.

    I am sure this is in response to the following:

    I received a private message with a question regarding my article that I can't answer, so I thought I'd put it here with the hopes that some of the gurus (Jeff, Gail, Grant, Gus, et al) out there may be able to help out.

    Why is the CPU time greater than the elapsed time

    CPU time = 5063 ms, elapsed time = 3309 ms.

    Thinking about it, that actually makes sense and isn't something I'd have come up with on my own, thanks. Do any of the other gurus out there also agree?

    I concur. What happens if you add MAXDOP = 1 to that particular query?

    FWIW, in a previous life MAXDOP was used as execution plans differed wildly on (single CPU, pre-hyper thread) Test kit than when they ran on the multi-CPU Live kit!

    Nice article btw 😉

    Dave Jackson

    First, to remember my manners, thank you for the complement on the article.

    I'd have to go back into my article to figure out which one it was, and then try it again at home with MAXDOP set to 1. The other thing I'll need is to find the time to do it. Maybe I'll get a few minutes some time this week, but right now I'm a bit busy with another article as well as other life activities that are clamoring for my time.

  • Uh... Wow... Over-achieve much?

    I feel a bit stupid to be commenting on this. I don't have much experience with partitioning beyond reading up on the theory and a few discussions. If I can make some spare time, I'd like to see what the execution plans looked like between your final solution and Jeff's original, purely academic.

    Nice job.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (1/27/2009)


    Uh... Wow... Over-achieve much?

    I feel a bit stupid to be commenting on this. I don't have much experience with partitioning beyond reading up on the theory and a few discussions. If I can make some spare time, I'd like to see what the execution plans looked like between your final solution and Jeff's original, purely academic.

    Nice job.

    Thank you, this makes me feel really good. I'll try to get the execution plans for you and post them here. It may a few days, as I do have several things going on right now.

  • Lynn Pettis (1/27/2009)


    Thank you, this makes me feel really good. I'll try to get the execution plans for and post them here. It may a few days, as I do have several things going on right now.

    No rush. If I stumble across a spare hour or two I'll do it myself. I'm just curious.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Actually, Grant, I'm curious also, and with your experience with working with execution plans, your feedback on that side would be quite interesting.

  • I don't get this. Both of the solutions create a non-partitioned instance of the partitioned table. One form is a temp table, and the other form is a materialized view (by the clustered index). This seems to me that it defeats the purpose of partitioning the table in the first place.

  • YeshuaAgapao (1/27/2009)


    I don't get this. Both of the solutions create a non-partitioned instance of the partitioned table. One form is a temp table, and the other form is a materialized view (by the clustered index). This seems to me that it defeats the purpose of partitioning the table in the first place.

    The purpose was to find a method for updating a partitioned table with running balances. The highly performant method discussed in Jeff Moden's article doesn't work on partitioned tables. So, how do you accomplish this task if you need to do it? The article shows you two methods for accomplishing this task.

    Behind all this, is a principle, how do you accomplish any task involving partitioned tables that fails, but works fine when done with a non-partitioned (or standard) table. The article wasn't discussing the pro's and con's of storing running totals in the database vs computing them on the client-side, or the pro's and con's of partitioned tables.

    I hope this explanation helps you with understanding the idea behind the article.

  • I guess I should first say thank you for the wonderful acknowledgements. Thanks, Lynn.

    It's also a real pleasure to read an article with a logical progression and a whole lot of "Alice's Restaurant" code examples to play with especially when they're that readable. Very nicely done, Mr. Pettis. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And I must thank you in return, Jeff. If it hadn't been for you, and some of the other gurus on SSC, I never would have written this article. Your article was the basis of it, and your tutelage the catalyst.

    I also appreciate your having taken the time to read my initial draft of the article. Your comments and suggestions, I feel, helped the article as well.

    So, I have to give you a well done as well.

  • Lynn Pettis (1/27/2009)


    Smendle (1/27/2009)


    That can happen on multi-processor systems that SQL is allowed to use.

    I am sure this is in response to the following:

    I received a private message with a question regarding my article that I can't answer, so I thought I'd put it here with the hopes that some of the gurus (Jeff, Gail, Grant, Gus, et al) out there may be able to help out.

    Why is the CPU time greater than the elapsed time

    CPU time = 5063 ms, elapsed time = 3309 ms.

    Thinking about it, that actually makes sense and isn't something I'd have come up with on my own, thanks. Do any of the other gurus out there also agree?

    The answer to this question is parallelism. The cpu time is greater because the query is spread across multiple processors, which allows you to do a lot more cpu processing in a shorter period of time. This is how CPU time becomes greater than elapsed time. If you set max dop to 1, you will not see the same behavior.

  • Forgot to comment on the article :hehe:.

    I thought the article was very informative, easy to read/understand, and offered great alternative methods to solving the "running total" enigma.

    Nicely done.

  • Adam Haines (1/27/2009)


    Forgot to comment on the article :hehe:.

    I thought the article was very informative, easy to read/understand, and offered great alternative methods to solving the "running total" enigma.

    Nicely done.

    Thank you, I appreciate your kind words regarding the article.

  • Makes sense now. I would still suggest that if the running totals requirement was the dominant form of querying on the table, I would rather just un-partition the whole table rather than create a non-partitioned duplicate of it (temporarily with the temp table or worse, permanently with the non-paritioned indexed view). Otherwise I would really try not to fufill any kind of real-time reporting requirement on it, only batch pre-generation of the aggregates.

  • YeshuaAgapao (1/28/2009)


    Makes sense now. I would still suggest that if the running totals requirement was the dominant form of querying on the table, I would rather just un-partition the whole table rather than create a non-partitioned duplicate of it (temporarily with the temp table or worse, permanently with the non-paritioned indexed view). Otherwise I would really try not to fulfill any kind of realtime reporting requirement on it, only batch pre-generation of the aggregates.

    I can't agree with un-partitioning the table. There may be other reasons that the table is partitioned. One reason, perhaps it isn't one million rows of data over ten years but 1 billion rows of data over ten years. With that, the majority of the queries against the table are limited to the current year and maybe the previous year with less than five percent of all queries access data older than that, but the data for all ten years needs to be available based on corporate policies.

    Please remember, the actual purpose of this article wasn't just to show you how to compute running totals on partitioned tables. The purpose was to demonstrate a methodology that could possibly be used to address problems with updates to partitioned tables. Can it be used in all cases, maybe not, but it is a tool that is available.

    Two other items. If you were to use the interim table method, you actually would only create a table with the necessary columns needed to complete the update. If the source table has 25 columns and is 2500 bytes wide, but you only need 6 columns and 75 bytes, that is all you would use. Same idea with the indexed view. You would only build the view using the columns you needed to perform the update.

  • Yeah if you could restrict the rows (and columns if savings is alot) it would be more reasonable. None of your code had WHERE clauses so I assumed that you wanted to crunch the whole table. If there was a WHERE clause, does Jeff's version break even if the qualifying rows don't span partitions?

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply