Running totals on different columns

  • I am experimenting with the SUM OVER aggregate function. I have a small test table containing transactions for holders:

    DECLARE @Transactions TABLE (EffectiveDate DATETIME, Holder VARCHAR(10), Quantity INT)

    INSERT INTO @Transactions (EffectiveDate, Holder, Quantity) VALUES

    ( '2001-01-01', 'John', 10 ), ( '2001-01-01', 'Peter', 20 ), ( '2001-01-01', 'John', 30 ),

    ( '2001-01-02', 'John', 20 ), ( '2001-01-02', 'Peter', 20 )

    and I want to get the cumulative total of a given holder as well as the cumulative total of all holders in order to eventually calculate a percentage. My current statement is:

    SELECT

    EffectiveDate, Holder, Quantity,

    SUM(Quantity) OVER (PARTITION BY Holder ORDER BY EffectiveDate, Holder RANGE UNBOUNDED PRECEDING) AS HQuantity,

    SUM(Quantity) OVER (ORDER BY EffectiveDate, Holder RANGE UNBOUNDED PRECEDING) AS TQuantity

    FROM (

    SELECT EffectiveDate, Holder, SUM(Quantity) AS Quantity

    FROM @Transactions

    GROUP BY EffectiveDate, Holder

    ) X

    ORDER BY EffectiveDate, Holder

    and gives the result:

    2001-01-01 John 40 40 40

    2001-01-01 Peter 20 20 60

    2001-01-02 John 20 60 80

    2001-01-02 Peter 20 40 100

    where I would expect

    2001-01-01 John 40 40 60

    2001-01-01 Peter 20 20 60

    2001-01-02 John 20 60 100

    2001-01-02 Peter 20 40 100

    Is what I want to achieve possible?

  • Have you tried using partition by in your second one instead of order by? That is:

    SUM(Quantity) OVER (PARTITION BY EffectiveDate ORDER BY Holder RANGE UNBOUNDED PRECEDING) AS TQuantity

    you may want to cast your EffectiveDate to a "DATE" datatype as well as you don't appear to care about the time aspect of the column, but that is just me being picky :).

    NOTE - I did not test that change as we are a SQL 2008 R2 shop and those windowing functions do not work in 2008 R2 Standard. But based on my understanding of windowing functions, I think it should work the way you expect it to. MIGHT need to cast EffectiveDate to a Date type. ie:

    SUM(Quantity) OVER (PARTITION BY CAST(EffectiveDate AS DATE) ORDER BY Holder RANGE UNBOUNDED PRECEDING) AS TQuantity

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Writing

    SUM(Quantity) OVER (ORDER BY EffectiveDate RANGE UNBOUNDED PRECEDING) AS TQuantity

    does the trick. I will have to learn more about this feature as I currently have the bad feeling of programming randomly!

  • Right. My bad.

    My version of it would have given you wrong results as it would have started at 0 whenever the date changed.

    Good job figuring it out though. And thanks for posting the solution.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • You beat me to it. Note that ROWS often performs better than range and can be used to calculate your HQuantity. Also note that the subquery is not required, you could write your query like this:

    DECLARE @Transactions TABLE (EffectiveDate DATETIME, Holder VARCHAR(10), Quantity INT)

    INSERT INTO @Transactions (EffectiveDate, Holder, Quantity) VALUES

    ( '2001-01-01', 'John', 10 ), ( '2001-01-01', 'Peter', 20 ), ( '2001-01-01', 'John', 30 ),

    ( '2001-01-02', 'John', 20 ), ( '2001-01-02', 'Peter', 20 );

    SELECT

    EffectiveDate,

    Holder,

    SUM(Quantity),

    SUM(SUM(Quantity)) OVER (PARTITION BY Holder ORDER BY EffectiveDate ROWS UNBOUNDED PRECEDING) AS HQuantity,

    SUM(SUM(Quantity)) OVER (ORDER BY EffectiveDate RANGE UNBOUNDED PRECEDING)

    FROM @Transactions

    GROUP BY EffectiveDate, Holder

    ORDER BY EffectiveDate, Holder; -- not required, including for presentation

    edit: simplified my solution.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • ROWS is (almost ?) ALWAYS offers MUCH better performance than RANGE! Try your best to use ROWS when possible.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (9/12/2016)


    ROWS is (almost ?) ALWAYS offers MUCH better performance than RANGE! Try your best to use ROWS when possible.

    ROWS won't work for the overall total here, because the OP wants to include all holders' transactions that fall on the same date. It may also not work for the holder's total if a holder can have multiple transactions on the same date. Since the sample doesn't contain this case, it's not clear how the OP wants to treat it.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • The sample actually includes this case. John has 2 transactions as of 2001-01-01.

  • drew.allen (9/12/2016)


    TheSQLGuru (9/12/2016)


    ROWS is (almost ?) ALWAYS offers MUCH better performance than RANGE! Try your best to use ROWS when possible.

    ROWS won't work for the overall total here, because the OP wants to include all holders' transactions that fall on the same date. It may also not work for the holder's total if a holder can have multiple transactions on the same date. Since the sample doesn't contain this case, it's not clear how the OP wants to treat it.

    Drew

    Yes. Hence the "when possible". I could have been more clear that the wrong answer could be received if you use ROWS when inappropriate.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (9/12/2016)


    ROWS is (almost ?) ALWAYS offers MUCH better performance than RANGE! Try your best to use ROWS when possible.

    I don't know about ALWAYS. I said ROWS "often" (and could have said "almost always") outperforms RANGE because there are exceptions. ROWS uses the in-memory window spool where RANGE writes to the tempdb. In theory ROWS would always outperform RANGE but consider these two queries:

    DECLARE @Transactions TABLE (EffectiveDate DATETIME, Holder VARCHAR(10), Quantity INT)

    INSERT INTO @Transactions (EffectiveDate, Holder, Quantity) VALUES

    ( '2001-01-01', 'John', 10 ), ( '2001-01-01', 'Peter', 20 ), ( '2001-01-01', 'John', 30 ),

    ( '2001-01-02', 'John', 20 ), ( '2001-01-02', 'Peter', 20 );

    SELECT

    EffectiveDate,

    Holder,

    SUM(SUM(Quantity)) OVER (ORDER BY EffectiveDate ROWS UNBOUNDED PRECEDING)

    FROM @Transactions

    GROUP BY EffectiveDate, Holder;

    SELECT

    EffectiveDate,

    Holder,

    SUM(SUM(Quantity)) OVER (ORDER BY EffectiveDate RANGE UNBOUNDED PRECEDING)

    FROM @Transactions

    GROUP BY EffectiveDate, Holder;

    Ignoring the fact that the first query (using ROWS) returns the wrong result, there's a couple things to notice here:

    1. both queries utilize the in-memory window spool so, in this case they are equal.

    2. When you add an ORDER BY clause the RANGE version performs much better as it does not require an additional sort for the presentation sort whereas the ROWS version does.

    In this case, RANGE is actually better.

    Now consider what happens when a correctly designed POC index is present and I don't specify ROWS or RANGE.

    DECLARE @Transactions

    TABLE

    (

    sk int identity,

    EffectiveDate DATETIME,

    Holder VARCHAR(10),

    Quantity INT,

    PRIMARY KEY(EffectiveDate, Holder, sk) -- POC index for demo purposes

    );

    INSERT INTO @Transactions (EffectiveDate, Holder, Quantity) VALUES

    ( '2001-01-01', 'John', 10 ), ( '2001-01-01', 'Peter', 20 ), ( '2001-01-01', 'John', 30 ),

    ( '2001-01-02', 'John', 20 ), ( '2001-01-02', 'Peter', 20 );

    SELECT

    EffectiveDate,

    Holder,

    SUM(SUM(Quantity)) OVER (ORDER BY EffectiveDate)

    FROM @Transactions

    GROUP BY EffectiveDate, Holder

    ORDER BY EffectiveDate, Holder; -- not required, including for presentation

    Here I'm getting RANGE behavior (which is required for the correct result set) with ROWS performance (the optimizer is still using a Window spool instead of a TABLE spool).

    Again, I believe that ROWS performs almost always performs better than RANGE, but not always.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hence the "almost" ... 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply