Generating SubTotals using GROUPING

  • Eric L Hackett (7/13/2010)


    How would one go about grouping by dates, i.e. month, year?

    So a query from 12/30/2009 thru 1/2/2010 would have something similar to this.

    Meter Day Month Year Barrels

    ```````````````````````````````````````

    Meter1 12/30/09 Dec 2009 5

    Meter1 12/31/09 Dec 2009 10

    Meter1 DecTotal 2009 15

    Meter1 2009Total 15

    Meter2 1/1/10 Jan 2010 8

    Meter2 1/2/10 Jan 2010 12

    Meter2 JanTotal 2010 20

    Meter2 2010Total 20

    You should probably start a regular forum post but the following article should do it for you...

    http://www.sqlservercentral.com/articles/Crosstab/65048/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice job, Sunil. I especially liked the fact that you compared what some folks go through with multiple queries (in one form or another) and the simplicity of WITH ROLLUP.

    As a side bar, you shouldn't use ORDER BY with either ROLLUP or CUBE... it's one of the very few places in SQL Server where the order is pretty much guaranteed to be correct and in the order expected.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/16/2010)


    Nice job, Sunil. I especially liked the fact that you compared what some folks go through with multiple queries (in one form or another) and the simplicity of WITH ROLLUP.

    As a side bar, you shouldn't use ORDER BY with either ROLLUP or CUBE... it's one of the very few places in SQL Server where the order is pretty much guaranteed to be correct and in the order expected.

    I have to contradict. There is only one way to guarantee order in SQL Server, and that is by specifying ORDER BY. Everything else depends.

    And even if it were not necessary, I'd still include it. It won't hurt performance (except maybe a few microseconds parse time), and it hardens my code against future changes to the engine.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (7/17/2010)


    Jeff Moden (7/16/2010)


    Nice job, Sunil. I especially liked the fact that you compared what some folks go through with multiple queries (in one form or another) and the simplicity of WITH ROLLUP.

    As a side bar, you shouldn't use ORDER BY with either ROLLUP or CUBE... it's one of the very few places in SQL Server where the order is pretty much guaranteed to be correct and in the order expected.

    I have to contradict. There is only one way to guarantee order in SQL Server, and that is by specifying ORDER BY. Everything else depends.

    And even if it were not necessary, I'd still include it. It won't hurt performance (except maybe a few microseconds parse time), and it hardens my code against future changes to the engine.

    Not a problem... show me just one example where it fails on its own as a final output.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/18/2010)


    Hugo Kornelis (7/17/2010)


    Jeff Moden (7/16/2010)


    Nice job, Sunil. I especially liked the fact that you compared what some folks go through with multiple queries (in one form or another) and the simplicity of WITH ROLLUP.

    As a side bar, you shouldn't use ORDER BY with either ROLLUP or CUBE... it's one of the very few places in SQL Server where the order is pretty much guaranteed to be correct and in the order expected.

    I have to contradict. There is only one way to guarantee order in SQL Server, and that is by specifying ORDER BY. Everything else depends.

    And even if it were not necessary, I'd still include it. It won't hurt performance (except maybe a few microseconds parse time), and it hardens my code against future changes to the engine.

    Not a problem... show me just one example where it fails on its own as a final output.

    I'm on holiday, so I won't even try to find an example. Note that it doesn't matter - read what I wrote: "it hardens my code against future changes to the engine".

    Can you show me an example in SQL Server 6.5 where GROUP BY without ORDER BY will not produce ordered results? And yet, many people who relied on this behaviour werefound they had shot themselves in the foot when upgrading to SQL Server 7.0.

    Can you show me an example in SQL Server 2005 where a SELECT without ORDER BY on a VIEW with TOP 100 PERCENT and ORDER BY will not produce ordered results? And yet, once again, many people found that relying on this behaviour caused a bullet-shaped hole in their foot.

    But let's reverse the challenge. Can you show me an example where a ROLLUP with ORDER BY performs worse than one without ORDER BY? And if you can't, why would you insist on removing the ORDER BY? What is the gain?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo,

    The gain is that there is nothing you can naturally order by that will give you the correct original order. 😉 And stop playing the "gotcha back" tricks... can you make the natural sort fail or not?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/19/2010)


    Hugo,

    The gain is that there is nothing you can naturally order by that will give you the correct original order. 😉

    Huh?

    USE AdventureWorks2008;

    SELECT TerritoryID, GROUPING(TerritoryID),

    SalesPersonID, GROUPING(SalesPersonID),

    SUM(TotalDue) AS SumOfTotalDue

    FROM Sales.SalesOrderHeader

    GROUP BY ROLLUP(TerritoryID, SalesPersonID)

    ORDER BY GROUPING(TerritoryID), TerritoryID,

    GROUPING(SalesPersonID), SalesPersonID;

    And stop playing the "gotcha back" tricks... can you make the natural sort fail or not?

    Whether or not I can is completely irrelevant. I advise people to include the ORDER BY, because it guarantees correct results. You advise people to omit the ORDER BY clause; in my opinion that places the burden on you to prove that you will never need it. Not even after installing a service pack or upgrading to the next version of SQL Server. Can you guarantee that?

    Heck, can you even guarantee that this will always work on the current version of SQL Server? I'm on holiday, with only a simple laptop computer, so my testing capacity is limited. But imagine a SQL Server instance on a computer with 16 cores, with Sales.SalesOrderHeader partitioned by TerritoryID and spread over 16 seperate spindles, and with the amount of data in that table bumped to several billion rows. I would expect (and, in fact, even HOPE) that each core gets to process the data for a single partition. Without the ORDER BY, the data will be returned as soon as it's ready. Why would SQL Server wait for core #1 to finish if core #5 already has some data ready to be returned?

    Again - if you advise people to rely on undocumented behaviour, the burden is on you to prove that it will work in all cases. And that it will continue to work in the future. If you can't, and you still keep giving the same advise, you are repeating the mistake many people made in the days of SQL Server 6.5 when they told people to omit the ORDER BY after a GROUP BY.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Guys, this topic was about the article on the usage of grouping() but you've moved away to wether or not to add a sort clause now. I feel a little guilty for that because I first made a note on the sorting. You've both made your points and since neither can give proof there's no need to keep poluting the topic any more. I personally totally see Hugo's point, so I'll keep on adding the order by clause. Even though I totally respect -if not live by- Jeff's advices normally. Since the sorting is undocumented behavior (unless someone else has a link?) I think it's up to each and everyone to decide for themselves whether or not their solution should rely on it and I don't see any reason not to add the order by clause. If you want to continue this debate, can you please start a separate topic for it?



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • Gosh, I hope I don't get my shoes wet by stepping near this, but.....

    The original script in the article absolutely needs an ORDER BY to properly interleave the two halves of the UNION ALL statement (the detail rows and the aggregated rows).

    Also, BOL is quite clear in saying that GROUP BY results are not guaranteed to be in any particular order and explicitly prescribes an ORDER BY clause if the order is important. With no direction from MS either way on queries using WITH ROLLUP or ROLLUP(), I'd have to agree that this advice would best be followed in those cases as well.

    -----

    edit:

    Sorry, R.P. I saw your post only after hitting send on mine.

  • Hugo Kornelis (7/19/2010)


    Jeff Moden (7/19/2010)


    Hugo,

    The gain is that there is nothing you can naturally order by that will give you the correct original order. 😉

    Huh?

    USE AdventureWorks2008;

    SELECT TerritoryID, GROUPING(TerritoryID),

    SalesPersonID, GROUPING(SalesPersonID),

    SUM(TotalDue) AS SumOfTotalDue

    FROM Sales.SalesOrderHeader

    GROUP BY ROLLUP(TerritoryID, SalesPersonID)

    ORDER BY GROUPING(TerritoryID), TerritoryID,

    GROUPING(SalesPersonID), SalesPersonID;

    And stop playing the "gotcha back" tricks... can you make the natural sort fail or not?

    Whether or not I can is completely irrelevant. I advise people to include the ORDER BY, because it guarantees correct results. You advise people to omit the ORDER BY clause; in my opinion that places the burden on you to prove that you will never need it. Not even after installing a service pack or upgrading to the next version of SQL Server. Can you guarantee that?

    Heck, can you even guarantee that this will always work on the current version of SQL Server? I'm on holiday, with only a simple laptop computer, so my testing capacity is limited. But imagine a SQL Server instance on a computer with 16 cores, with Sales.SalesOrderHeader partitioned by TerritoryID and spread over 16 seperate spindles, and with the amount of data in that table bumped to several billion rows. I would expect (and, in fact, even HOPE) that each core gets to process the data for a single partition. Without the ORDER BY, the data will be returned as soon as it's ready. Why would SQL Server wait for core #1 to finish if core #5 already has some data ready to be returned?

    Again - if you advise people to rely on undocumented behaviour, the burden is on you to prove that it will work in all cases. And that it will continue to work in the future. If you can't, and you still keep giving the same advise, you are repeating the mistake many people made in the days of SQL Server 6.5 when they told people to omit the ORDER BY after a GROUP BY.

    I humbly stand corrected. That sort works. Well done, Hugo. Still, the order of run return is, in fact, documented in Books Online...

    ROLLUP

    Specifies that in addition to the usual rows provided by GROUP BY, summary rows are introduced into the result set. [font="Arial Black"]Groups are summarized in a hierarchical order[/font], from the lowest level in the group to the highest. [font="Arial Black"]The group hierarchy is determined by the order in which the grouping columns are specified. [/font]Changing the order of the grouping columns can affect the number of rows produced in the result set.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • R.P.Rozema (7/19/2010)


    Guys, this topic was about the article on the usage of grouping() but you've moved away to wether or not to add a sort clause now. I feel a little guilty for that because I first made a note on the sorting. You've both made your points and since neither can give proof there's no need to keep poluting the topic any more. I personally totally see Hugo's point, so I'll keep on adding the order by clause. Even though I totally respect -if not live by- Jeff's advices normally. Since the sorting is undocumented behavior (unless someone else has a link?) I think it's up to each and everyone to decide for themselves whether or not their solution should rely on it and I don't see any reason not to add the order by clause. If you want to continue this debate, can you please start a separate topic for it?

    The sorting is not undocumented... and that was my point. 😉 And, I am truly humbled by your comment. Thank you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My final comment in this topic, and then I'll respect R.P.'s request.

    Jeff Moden (7/19/2010)


    Still, the order of run return is, in fact, documented in Books Online...

    ROLLUP

    Specifies that in addition to the usual rows provided by GROUP BY, summary rows are introduced into the result set. [font="Arial Black"]Groups are summarized in a hierarchical order[/font], from the lowest level in the group to the highest. [font="Arial Black"]The group hierarchy is determined by the order in which the grouping columns are specified. [/font]Changing the order of the grouping columns can affect the number of rows produced in the result set.

    This quote describes which possible summary rows are and are not introduced in the result set, not the order in which results are returned.

    (I'm also tempted to point out the use of the term "result set" rather than "recordset" in this quote, but the terminology in BOL as a whole is so often wrong that I'd probably better not go there)

    Jeff, or anyone else - if you want to continue this debate, then please start a new topic and send me a PM with the link, or include the link in this topic. Without a link, I'll probably never find the new topic.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • The result set I get is not the same as the result set posted in the article. Can you repost the code to populate the tables? I had a DBA run it here at my company and he got the same results that I got.

    First 10 rows:

    'Product1', 'Scheme1', 1524.15, 124.32, '2010-07-20 16:39:32'

    'Product1', 'Scheme2', 1524.15, 124.32, '2010-07-20 16:39:32'

    'Product1', 'Scheme3', 1524.15, 124.32, '2010-07-20 16:39:32'

    'Product1', 'Scheme4', 1524.15, 124.32, '2010-07-20 16:39:32'

    'Product1', 'Scheme5', 1524.15, 124.32, '2010-07-20 16:39:32'

    'Product1', 'Scheme1', 9144.90, 745.92, '2010-07-20 16:39:32'

    'Product1', 'Scheme2', 4572.45, 372.96, '2010-07-20 16:39:32'

    'Product1', 'Scheme3', 3048.30, 248.64, '2010-07-20 16:39:32'

    'Product1', 'Scheme4', 3048.30, 248.64, '2010-07-20 16:39:32'

    'Product1', 'Scheme5', 3048.30, 248.64, '2010-07-20 16:39:32'

  • witasj,

    I see the same contents in @Grouptest as you do.

    It looks as though the loaded table shown in the article may not be from the final version of the code to build it, but the results with aggregated subtotals do seem to match up to what's shown.

  • Here one question comes in the mind that why only one row is indicated by 1? The Scheme_Total and Product_Total is also aggregated but it is not indicated by 1. I have not yet received any satisfactory answer for this. I will post the answer here once I get it.

    I tried to answer this question from the article last week, and realized today that I was wrong in my explanation. I've corrected that post above.

Viewing 15 posts - 16 through 30 (of 35 total)

You must be logged in to reply to this topic. Login to reply