Solving the Running Total and Ordinal Rank Problems (Rewritten)

  • "Buying a DBMS like SQL Server (or Oracle or PostgreSQL or ...) to use only the standard features is like buying a scientific calculator and just use plus and minus" ;-).

    Greets

    Flo

    That's a good one!

  • I have to agree with Lynn and Flo on this one. As much as it might be nice to code generically so the code will perform on all DBMS's, it's not feasible. There are many enhancements unique to each that work better than the generic "Standards" code.

    Why buy a product and not use the features it includes? Once all products have all the same features, with the exact same standards, then we can revisit. It's not just SQL that has it's own feature sets, Oracle has plenty of them too.

    If I were to buy a Porsche with a turbo package (which when turned on, drops the body closer to the ground for better performance and lower center of gravity), but never use it - I just wasted 50K. If I am not using the feature - I am wasting money.

    And really, don't we all have to fix crud code anyway - no matter what RDBMS?

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • CirquedeSQLeil (11/14/2009)


    I have to agree with Lynn and Flo on this one. As much as it might be nice to code generically so the code will perform on all DBMS's, it's not feasible. There are many enhancements unique to each that work better than the generic "Standards" code.

    Why buy a product and not use the features it includes? Once all products have all the same features, with the exact same standards, then we can revisit. It's not just SQL that has it's own feature sets, Oracle has plenty of them too.

    If I were to buy a Porsche with a turbo package (which when turned on, drops the body closer to the ground for better performance and lower center of gravity), but never use it - I just wasted 50K. If I am not using the feature - I am wasting money.

    And really, don't we all have to fix crud code anyway - no matter what RDBMS?

    Well said.



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Heh... write a stored procedure that takes just one silly little parameter and you suddenly have code that isn't portable between SQL Server and Oracle. True code portability is a myth.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • +1 for Florian's calculator analogy.


    N 56°04'39.16"
    E 12°55'05.25"

  • As I said, "a wise man told me". First person who told me this analogy was Jeff 😉

  • A schema-level CREATE ASSERTION would be extremely useful. I'd probably place that higher up the list than any of the new running-total-type ideas presented (including LAG/LEAD). Failing that, extending table-level constraints to work with aggregates, and reference foreign-key-related tables would be nice.

    I'm a T-SQL developer, not a lowest-common-denominator SQL developer. 😛

    Temporary tables may return data in an order other than in which they were inserted (this works the same with #temporary tables):

    DECLARE @T TABLE (row_id BIGINT NULL, padding NCHAR(3000) NULL);

    INSERT @T (row_id, padding)

    SELECT TOP (10)

    row_id = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    padding = REPLICATE(N'X', 3000)

    FROM master.sys.allocation_units A1

    OPTION (MAXDOP 1);

    DELETE @T

    WHERE row_id <= 5;

    INSERT @T (row_id, padding)

    SELECT TOP (5)

    row_id = 10 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    padding = NULL

    FROM master.sys.allocation_units A1

    OPTION (MAXDOP 1);

    -- Rows do *not* come back in insertion order

    SELECT *

    FROM @T;

  • Peso-284236 (11/16/2009)


    +1 for Florian's calculator analogy.

    Ouch. That's a dreadful pun, even if unintentional!

  • At least it is "lowest common" 😀


    N 56°04'39.16"
    E 12°55'05.25"

  • Regarding my temp table without ORDER BY observation:

    It was just that. An obversation.

    I didn't say that I don't use ORDER BY.

    I didn't recommend not using ORDER BY.

    I was hoping to see SOME discussion, but in spite of genuine effort, I obvously didn't word it carefully enough.

    The scientific calculator analogy was out of the park.

    I love this forum!

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • Tom Garth (11/16/2009)


    I was hoping to see SOME discussion, but in spite of genuine effort, I obvously didn't word it carefully enough.

    Well I tried 😀

  • Paul White (11/14/2009)


    sqlservercentralitgeekry (11/14/2009)


    Hugo's method depends on unique, consecutive integers, and it will fail in the event that an implementation uses a non-unique order by clause. ROW_NUMBER() solves this problem with a minimum of fuss.

    The order in which numbers are assigned to rows by ROW_NUMBER is not deterministic if there are ties, so it's hard to see how that helps at all.

    In the event of a tie, RANK will produce a non-unique, non-contiguous series, which breaks the running total values from the point of the duplication forward. While it's true that ROW_NUMBER will arbitrarily break a tie (using the physical order of rows in the table I believe), the running total calculations will remain intact across the entire series, including the rows on which the ordering criteria are non-unique. Granted, the non-unique rows may have incorrect running total values, but ROW_NUMBER preserves the integrity of the series as a whole.

    One application I can think of immediately is in the algorithms used by [edit: SOME] banks to post transactions. Nightly transactions are posted by Date (without time), then by Amount DESC to maximize overdraft fees. RANK will error in the event of matching amounts, ROW_NUMBER will not. For that application, an arbitrary tie-break is acceptable, as long as an accurate running total value is preserved.

  • sqlservercentralitgeekry (11/16/2009)


    Granted, the non-unique rows may have incorrect running total values, but ROW_NUMBER preserves the integrity of the series as a whole.


    For that application, an arbitrary tie-break is acceptable, as long as an accurate running total value is preserved.

    Those two statements seem contradictory...?

    The order in which row numbers are assigned (in the event of a tie) is arbitrary and non-deterministic, and since a table (set) has no concept of inherent order, 'physical order of rows in the table' is meaningless. Parallelism or a hash operator earlier in the plan are examples where the idea falls apart even if we imagine that tables do have some inherent order.

    The original statement was "Hugo's method depends on unique, consecutive integers, and it will fail in the event that an implementation uses a non-unique order by clause. ROW_NUMBER() solves this problem with a minimum of fuss.". Hugo's implementation does use a deterministic order by - anything else would be daft. Why introduce this question at all?

    Any robust implementation using the ranking functions should normally ensure a deterministic order, and there's really no excuse for not doing that where required. In the hypothetical banking example, one might, for example, break the tie on transaction_id.

    Paul

  • Paul White (11/16/2009)


    sqlservercentralitgeekry (11/16/2009)


    Granted, the non-unique rows may have incorrect running total values, but ROW_NUMBER preserves the integrity of the series as a whole.


    For that application, an arbitrary tie-break is acceptable, as long as an accurate running total value is preserved.

    Those two statements seem contradictory...?

    The order in which row numbers are assigned (in the event of a tie) is arbitrary and non-deterministic, and since a table (set) has no concept of inherent order, 'physical order of rows in the table' is meaningless. Parallelism or a hash operator earlier in the plan are examples where the idea falls apart even if we imagine that tables do have some inherent order.

    The original statement was "Hugo's method depends on unique, consecutive integers, and it will fail in the event that an implementation uses a non-unique order by clause. ROW_NUMBER() solves this problem with a minimum of fuss.". Hugo's implementation does use a deterministic order by - anything else would be daft. Why introduce this question at all?

    Any robust implementation using the ranking functions should normally ensure a deterministic order, and there's really no excuse for not doing that where required. In the hypothetical banking example, one might, for example, break the tie on transaction_id.

    Paul

    Not that I care to get in the middle of a good brawl, but where is the presumption that a running total MUST be deterministic come from? You can have a running total based on a partial order if the data doesn't lend itself to having a perfectly unique order. Ultimately - other than the grouping levels, any additional order by is arbitrary, just based on what might best fit the data being reported. I'd hate to be the one telling the business unit that the LOGICAL (but not perfect) order they demand isn't the one they're going to get, "just because".

    As long as the running totals break according to the levels you determine, and the detail CAN be ordered in such a way that the running total can make mathematical sense - where's the problem?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (11/16/2009)[

    Not that I care to get in the middle of a good brawl, but where is the presumption that a running total MUST be deterministic come from?

    I am not sure that a running total MUST be deterministic, but I have simply never seen otherwise. I worked quite a lot with cases like inventory (how much stuff do we have), or insurance (how much money counts towards the deductible), etc. In every case I have ever worked with, running totals were always deterministic.

    Can you give us a real life example when we care about running totals, and they can be non-deterministic?

Viewing 15 posts - 76 through 90 (of 307 total)

You must be logged in to reply to this topic. Login to reply