September 12, 2016 at 9:40 am
I am experimenting with the SUM OVER aggregate function. I have a small test table containing transactions for holders:
DECLARE @Transactions TABLE (EffectiveDate DATETIME, Holder VARCHAR(10), Quantity INT)
INSERT INTO @Transactions (EffectiveDate, Holder, Quantity) VALUES
( '2001-01-01', 'John', 10 ), ( '2001-01-01', 'Peter', 20 ), ( '2001-01-01', 'John', 30 ),
( '2001-01-02', 'John', 20 ), ( '2001-01-02', 'Peter', 20 )
and I want to get the cumulative total of a given holder as well as the cumulative total of all holders in order to eventually calculate a percentage. My current statement is:
SELECT
EffectiveDate, Holder, Quantity,
SUM(Quantity) OVER (PARTITION BY Holder ORDER BY EffectiveDate, Holder RANGE UNBOUNDED PRECEDING) AS HQuantity,
SUM(Quantity) OVER (ORDER BY EffectiveDate, Holder RANGE UNBOUNDED PRECEDING) AS TQuantity
FROM (
SELECT EffectiveDate, Holder, SUM(Quantity) AS Quantity
FROM @Transactions
GROUP BY EffectiveDate, Holder
) X
ORDER BY EffectiveDate, Holder
and gives the result:
2001-01-01 John 40 40 40
2001-01-01 Peter 20 20 60
2001-01-02 John 20 60 80
2001-01-02 Peter 20 40 100
where I would expect
2001-01-01 John 40 40 60
2001-01-01 Peter 20 20 60
2001-01-02 John 20 60 100
2001-01-02 Peter 20 40 100
Is what I want to achieve possible?
September 12, 2016 at 9:47 am
Have you tried using partition by in your second one instead of order by? That is:
SUM(Quantity) OVER (PARTITION BY EffectiveDate ORDER BY Holder RANGE UNBOUNDED PRECEDING) AS TQuantity
you may want to cast your EffectiveDate to a "DATE" datatype as well as you don't appear to care about the time aspect of the column, but that is just me being picky :).
NOTE - I did not test that change as we are a SQL 2008 R2 shop and those windowing functions do not work in 2008 R2 Standard. But based on my understanding of windowing functions, I think it should work the way you expect it to. MIGHT need to cast EffectiveDate to a Date type. ie:
SUM(Quantity) OVER (PARTITION BY CAST(EffectiveDate AS DATE) ORDER BY Holder RANGE UNBOUNDED PRECEDING) AS TQuantity
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
September 12, 2016 at 9:56 am
Writing
SUM(Quantity) OVER (ORDER BY EffectiveDate RANGE UNBOUNDED PRECEDING) AS TQuantity
does the trick. I will have to learn more about this feature as I currently have the bad feeling of programming randomly!
September 12, 2016 at 10:00 am
Right. My bad.
My version of it would have given you wrong results as it would have started at 0 whenever the date changed.
Good job figuring it out though. And thanks for posting the solution.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
September 12, 2016 at 10:32 am
You beat me to it. Note that ROWS often performs better than range and can be used to calculate your HQuantity. Also note that the subquery is not required, you could write your query like this:
DECLARE @Transactions TABLE (EffectiveDate DATETIME, Holder VARCHAR(10), Quantity INT)
INSERT INTO @Transactions (EffectiveDate, Holder, Quantity) VALUES
( '2001-01-01', 'John', 10 ), ( '2001-01-01', 'Peter', 20 ), ( '2001-01-01', 'John', 30 ),
( '2001-01-02', 'John', 20 ), ( '2001-01-02', 'Peter', 20 );
SELECT
EffectiveDate,
Holder,
SUM(Quantity),
SUM(SUM(Quantity)) OVER (PARTITION BY Holder ORDER BY EffectiveDate ROWS UNBOUNDED PRECEDING) AS HQuantity,
SUM(SUM(Quantity)) OVER (ORDER BY EffectiveDate RANGE UNBOUNDED PRECEDING)
FROM @Transactions
GROUP BY EffectiveDate, Holder
ORDER BY EffectiveDate, Holder; -- not required, including for presentation
edit: simplified my solution.
-- Itzik Ben-Gan 2001
September 12, 2016 at 11:25 am
ROWS is (almost ?) ALWAYS offers MUCH better performance than RANGE! Try your best to use ROWS when possible.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
September 12, 2016 at 2:25 pm
TheSQLGuru (9/12/2016)
ROWS is (almost ?) ALWAYS offers MUCH better performance than RANGE! Try your best to use ROWS when possible.
ROWS won't work for the overall total here, because the OP wants to include all holders' transactions that fall on the same date. It may also not work for the holder's total if a holder can have multiple transactions on the same date. Since the sample doesn't contain this case, it's not clear how the OP wants to treat it.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 12, 2016 at 2:33 pm
The sample actually includes this case. John has 2 transactions as of 2001-01-01.
September 12, 2016 at 3:00 pm
drew.allen (9/12/2016)
TheSQLGuru (9/12/2016)
ROWS is (almost ?) ALWAYS offers MUCH better performance than RANGE! Try your best to use ROWS when possible.ROWS won't work for the overall total here, because the OP wants to include all holders' transactions that fall on the same date. It may also not work for the holder's total if a holder can have multiple transactions on the same date. Since the sample doesn't contain this case, it's not clear how the OP wants to treat it.
Drew
Yes. Hence the "when possible". I could have been more clear that the wrong answer could be received if you use ROWS when inappropriate.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
September 12, 2016 at 3:14 pm
TheSQLGuru (9/12/2016)
ROWS is (almost ?) ALWAYS offers MUCH better performance than RANGE! Try your best to use ROWS when possible.
I don't know about ALWAYS. I said ROWS "often" (and could have said "almost always") outperforms RANGE because there are exceptions. ROWS uses the in-memory window spool where RANGE writes to the tempdb. In theory ROWS would always outperform RANGE but consider these two queries:
DECLARE @Transactions TABLE (EffectiveDate DATETIME, Holder VARCHAR(10), Quantity INT)
INSERT INTO @Transactions (EffectiveDate, Holder, Quantity) VALUES
( '2001-01-01', 'John', 10 ), ( '2001-01-01', 'Peter', 20 ), ( '2001-01-01', 'John', 30 ),
( '2001-01-02', 'John', 20 ), ( '2001-01-02', 'Peter', 20 );
SELECT
EffectiveDate,
Holder,
SUM(SUM(Quantity)) OVER (ORDER BY EffectiveDate ROWS UNBOUNDED PRECEDING)
FROM @Transactions
GROUP BY EffectiveDate, Holder;
SELECT
EffectiveDate,
Holder,
SUM(SUM(Quantity)) OVER (ORDER BY EffectiveDate RANGE UNBOUNDED PRECEDING)
FROM @Transactions
GROUP BY EffectiveDate, Holder;
Ignoring the fact that the first query (using ROWS) returns the wrong result, there's a couple things to notice here:
1. both queries utilize the in-memory window spool so, in this case they are equal.
2. When you add an ORDER BY clause the RANGE version performs much better as it does not require an additional sort for the presentation sort whereas the ROWS version does.
In this case, RANGE is actually better.
Now consider what happens when a correctly designed POC index is present and I don't specify ROWS or RANGE.
DECLARE @Transactions
TABLE
(
sk int identity,
EffectiveDate DATETIME,
Holder VARCHAR(10),
Quantity INT,
PRIMARY KEY(EffectiveDate, Holder, sk) -- POC index for demo purposes
);
INSERT INTO @Transactions (EffectiveDate, Holder, Quantity) VALUES
( '2001-01-01', 'John', 10 ), ( '2001-01-01', 'Peter', 20 ), ( '2001-01-01', 'John', 30 ),
( '2001-01-02', 'John', 20 ), ( '2001-01-02', 'Peter', 20 );
SELECT
EffectiveDate,
Holder,
SUM(SUM(Quantity)) OVER (ORDER BY EffectiveDate)
FROM @Transactions
GROUP BY EffectiveDate, Holder
ORDER BY EffectiveDate, Holder; -- not required, including for presentation
Here I'm getting RANGE behavior (which is required for the correct result set) with ROWS performance (the optimizer is still using a Window spool instead of a TABLE spool).
Again, I believe that ROWS performs almost always performs better than RANGE, but not always.
-- Itzik Ben-Gan 2001
September 12, 2016 at 3:25 pm
Hence the "almost" ... 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply