Solving the "Running Total" & "Ordinal Rank" Problems in SS 2k/2k5

  • It's about trade-offs (the profit vs. time vs. cost triangle) and risk...if it does the job at hand...

    Start designing for every eventuality and it'll never get done.

    Sure there are a lot of bads against it.

    Sure its not best practices and should NEVER be used...except maybe in 1 or 2 cases where nothing else works, and put loads of warnings in writing to all the appropriate people to C.Y.A.

    Joel spells it out nicely in http://www.joelonsoftware.com/design/1stDraft/01.html ...

    "Design, for my purposes, is about making tradeoffs.

    Let's design a trashcan for a city street corner, shall we?

    Let me give you some design constraints.

    It has to be pretty light, because the dustboys, er, sanitation engineers come by and they have to pick it up to dump the trash in the garbage truck.

    Oh, and it has to be heavy, or it will blow away in the wind or get knocked over. (True story: I once got in an accident because a trash can blew in front of our car. Nobody was hurt, not even the trashcan.)

    It has to be really big. People throw away a lot of trash throughout the day and at a busy intersection if you don't make it big enough, it overflows and garbage goes everywhere. When that happens, one of the little six-pack plastic ringy-dingies will get in the ocean, and a cute little birdy will get ensnared in it, and choke to death. YOU DON'T WANT TO KILL BIRDIES, DO YOU?

    Oh, also, it needs to be pretty small, because otherwise it's going to take up room on the sidewalk, forcing the pedestrians to squeeze past each other, which, possibly, when the Effete Yuppie Listening to His iPod gets distracted by a really funny joke on the Ricky Gervais podcast and accidentally brushes against the Strangely Haunted Vietnam-Era Veteran, can result in an altercation of historic proportions.

    Ok, light, heavy, big, and small. What else. It should be closed on the top, so rubbish doesn't fly away in the wind. It should be open on the top, so it's easy to throw things away.

    It should be really, really, really cheap.

    Notice a trend? When you're designing something, you often have a lot of conflicting constraints."


    Regards
    Andy Davies

  • Darrin Blunt (1/31/2008)


    Since you installed SQL 2K5, have you looked into the "OVER" clause (BOL reference: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/ddcef3a6-0341-43e0-ae73-630484b7b398.htm) and the Ranking Functions (BOL reference: ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/e7f917ba-bf4a-4fe0-b342-a91bcf88a71b.htm)?

    I am curious how they perform against your test. I have used the Ranking Functions, but only on result sets of a few thousand rows and it has performed well for me.

    Darrin

    Sure, I considered using SUM() OVER to do a running total... but because, unlike the Oracle version, the SUM() OVER only allows for a PARTITION BY and not an ORDER BY, I'm not sure how you'd actually use it for a running total. Here's some test code... someone show me how, please... produce a running total for the SAL column. And, NO! I'm not trying to be a smart guy! I don't know how to do it in SQL Server with SUM() OVER()...

    CREATE TABLE #Emp (ENo INT PRIMARY KEY, EName VARCHAR(35), DeptNo INT , Sal INT)

    INSERT INTO #Emp

    SELECT 1, 'A', 10, 1000 UNION ALL

    SELECT 2, 'B', 10, 1500 UNION ALL

    SELECT 3, 'C', 20, 2000 UNION ALL

    SELECT 4, 'D', 20, 3500 UNION ALL

    SELECT 5, 'E', 20, 1500

    ... and, maybe I'm missing something, but I don't think it can be done in SQL Server using SUM() OVER() without using CROSS JOINs and all.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/31/2008)


    However, the engine has no obligation to scan the table in index order, as there are no language constructs which impose an ordering on the execution order of a scan.

    Sure it does, Frank... it's a clustered index and scanning the index inherently means it's also scanning the table (believe someone else also said that on this thread).

    But there's no absolute, 100% guarentee that it will scan the index in order. The vast majority of the time, sure.

    On a non-partitioned table, with one proc, using a isolation higher than read uncommited, and with no other scans running, the table will be read from page 1 of the clustered index to page n, using the next page pointer on each page to find the next page. Hence, rows will come out in order.

    My guess is, with the current way the engine behaves, the tablelockX and forced index hints will 'make' SQL behave the desired way. I'd probably add in a maxdop 1 hint, just to be sure that SQL can't use multi procs and possibly mess up the order of the returned rows (multiple processors each scanning ranges of the table)

    I want to try this out later using a partitioned table on 2k5 (partitioning on something other than the clustering key). See how it behaves.

    I hear you about parallelism. I've tried to force a query parallel to test stuff before, and no matter how large a table I used, SQL went single processor. Is most irritating sometimes.

    p.s. The term 'merry-go-round' isn't mine. I first heard it from a member of the storage engine team a couple years back. It's in BoL and Msdn and refers to a kind of physical disk scan that is available in enterprise edition.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden (1/31/2008)

    And, that should be the general theme for all those that say the methods posted in the article may not work... the article went to great lengths to demonstrate where things can go wrong and how to make it right through the use of demonstrable code. If you know something won't work, please post the data generation and demonstrable method code that proves it won't work. That way, we all learn something and I don't have to pretend I'm from Missouri 😉

    Hi Jeff,

    You might have seen me use the word "beliefs". I chose this word for a reason:

    I have no "knowledge" about this. If I had, I would have laid my lines of code on you, but I do not have. But this doesn't mean I should not share my concerns.

    A while ago I actually started a discussion regarding ORDER BY in views (which I think is related to our current discussion), and this thread lead me to my current belief that ORDER is only guaranteed when returning results.

    Yet I am not convinced, even if you show me a thousand times that your code works. What I really would like to see is some documentation from Microsoft on this topic. Not that clustered indexes are ordered, but whether or not an ORDER can be guaranteed in a DML statement, even more: the order is preserved.

    Simplified example:

    SELECT *

    FROM (SELECT TOP 100 id FROM syscolumns ORDER BY id DESC) SubQuery

    WHERE id = 1

    I understand that the ORDER is guaranteed during the index scan and the "TOP" operation, but who tells me that the ORDER is still guaranteed during or after the "Filter" (WHERE) operation?

    To solve this, we would need to know how the optimizer works.

    And no, I am not offended by you picking on me;), only through discussion we are able to open our minds for new ideas.

    Actually, I will probably think about this topic a few days and maybe even change my mind.

    Best Regards,

    Chris Büttner

  • Jeff Moden (1/31/2008)


    Sure it does, Frank... it's a clustered index and scanning the index inherently means it's also scanning the table (believe someone else also said that on this thread). And, yeah, the WITH INDEX hint forces the optimizer to use that clustered index... That's also documented quite well in Books Online. It's not a kludge and it will always work...

    I agree with Jeff about this.

    You can see it by these two execution plans:

    set showplan_text on

    select * from dbo.Orders

    |--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[Orders].[PK_Orders]))

    select * from dbo.Orders order by orderid

    |--Clustered Index Scan(OBJECT: ([Northwind].[dbo].[Orders].[PK_Orders]), ORDERED FORWARD)

    As you can see the engin doesn't (need to) perform a sort.

    The same is true when you are requesting columns covered by an index:

    select customerid from Orders order by customerid

    |--Index Scan(OBJECT: ([Northwind].[dbo].[Orders].[CustomerID]), ORDERED FORWARD)

    Here the engine uses a non clustered index, but again no sort is performed.

    If we put a condition on the indexed columns, we are implicity forcing an ordering (the ORDERED FORWARD clause in the plan): the engine performs a seek to find the starting point in the index that is then scanned following the key order.

    select customerid from Orders where customerid>N'a'

    |--Index Seek(OBJECT: ([Northwind].[dbo].[Orders].[CustomerID]), SEEK: ([Orders].[CustomerID] > 'a') ORDERED FORWARD)


    Salvor

  • I'm very sorry to have to tell you that it fails with partitioned objects on SQL2005.

    I've tested it on an x64 EE SP2 CU2 4-socket-dualcore 16Gb

    I have added the zipped sollution (attachement).

    Even OPTION (maxdop 1) does not correct it !

    This does not devalue the article !

    We have to keep on testing and being creative in finding solutions to problems !

    I like the way this article did set many of us on to reflecting about the way our DBMS engine works and many joined into testing and

    finding foundation(s) for the proposed solutions 😎

    REMARK:

    For those who downloaded my solution before 20080201 1300 GMT

    I've forgotten to remove the drop-database statements in script 00 !

    :blush:

    I've modified it and re-attached the zip

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (2/1/2008)


    I'm very sorry to have to tell you that it fails with partitioned objects on SQL2005.

    Thanks for the code, Johan... I'll check it out! And thanks for taking up the spirit of the article! 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As always when "big brothers" are watching ... shivers in the spine :Whistling:

    I've also tested at 32-bit Dev Edtn SP2 CU5 on 2-way XP sp2.

    Same issue. :crying:

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Christian Buettner (2/1/2008)


    Yet I am not convinced, even if you show me a thousand times that your code works. What I really would like to see is some documentation from Microsoft on this topic.

    Heh... actually, I'd like to see some better documentation from Microsoft on the subjects, myself.

    Thanks for the feedback, Christian.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I posted about this topic on the SQL Server MVP private forum. I will follow-up if anyone there comes up with anything. I specifically asked Iztik Ben-Gan if any of the TSQL enhancements he has been pushing VERY hard for would address this problem. I plan on asking the Microsoft SQL Server engine guys in April at the MVP summit as well.

    I am not surprised about the partitioned table 'failure' proof recently posted. Which we could figure out a way to do the read-along scan test on a huge table in enterprise edition too. Perhaps one of you guys with a big box could test that too?

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • ..I wonder what happens when the partitioning function is based on the "left-edge" of the Clustered index? It's an interesting example.

    It's definitely a caveat to throw onto the technique. I almost always end up running this stuff on temp objects (can't say I've ever had the joy to build a partitioned "temporary table")

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Carla Wilson (1/31/2008)


    Jeff,

    Thanks for the great examples on generating running totals. Question: You mentioned that you have to include the "anchor" (or dummy) in the update statement in order for it to work correctly, but you didn't explain why. I can see its use in the group total, group rank, but could you explain why it is needed in the update statement for the overall running total? (aside from "just because it doesn't work without it")

    Thanks for the question, Carla...

    I've not actually been able to get the code to fail in 2k or 2k5 by removing the "Anchor" for non-grouped running aggregates. Matt Miller said that he saw it fail on his 2k5 box and the anchor fixed it. Because of some extreme testing we did "together" on several other issues, I pretty much trust Matt, so I included it in the article as a "gotta have"... it causes no harm and prevents a potential problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (2/1/2008)


    I posted about this topic on the SQL Server MVP private forum. I will follow-up if anyone there comes up with anything. I specifically asked Iztik Ben-Gan if any of the TSQL enhancements he has been pushing VERY hard for would address this problem. I plan on asking the Microsoft SQL Server engine guys in April at the MVP summit as well.

    I am not surprised about the partitioned table 'failure' proof recently posted. Which we could figure out a way to do the read-along scan test on a huge table in enterprise edition too. Perhaps one of you guys with a big box could test that too?

    Outstanding... thanks for taking the time to do all that, Kevin.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • TheSQLGuru (2/1/2008)


    I posted about this topic on the SQL Server MVP private forum. I will follow-up if anyone there comes up with anything. I specifically asked Iztik Ben-Gan if any of the TSQL enhancements he has been pushing VERY hard for would address this problem. I plan on asking the Microsoft SQL Server engine guys in April at the MVP summit as well.

    I am not surprised about the partitioned table 'failure' proof recently posted. Which we could figure out a way to do the read-along scan test on a huge table in enterprise edition too. Perhaps one of you guys with a big box could test that too?

    Looking forward the MVP comments/remarks/suggestions ...

    I've off course also tested the script in a non-partitioned object

    at our sql2005 EE. Worked like a charm.

    However, I didn't x-fold the number of rows.

    Maybe.... , just maybe...., :Whistling: .... we would need a sneeky little devil smyle 😛

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • The "Anchor" components tend to ensure that the @variables don't start trying to persist, or update too soon or late.

    The form I usually use is

    ...

    @variable=field=updatecalculation,

    @dummyvar=@variable,

    ...

    The purpose of the @dummyvar is to ensure that @variable is reevaluated on each row (and reevaluated WHEN you want it to be). By putting that extra assignment, the optimizer is being told "I want that current derived value NOW". @dummyvar is not used anywhere else in any other calculation. It's really just an extra layer of protection.

    It tends to be more important when the "running function" is something more complicated than a simple total or grouped total or running total.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 46 through 60 (of 250 total)

You must be logged in to reply to this topic. Login to reply