October 25, 2016 at 6:50 pm
Hello,
I'm having trouble understanding why adding window ordering in a partition will create a 'running total' (when using the sum aggregate, of course), where as just doing a SUM with a PARTITION BY doesn't. I just don't see how telling SQL to order by something, it means 'add all of the values from the specified row until the current row'. For example:
Select
custid,orderid,val,SUM(val) OVER(PARTITION BY custid ORDER BY orderdate, orderid) AS runningtotal,
SUM(val) OVER(PARTITION BY custid) AS custtotal
from sales.OrderValues AS V
order by custid
The column 'runningtotal' has an order by (and of course the default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), where as the other doesn't. That's the only difference... what am I missing?
Thanks!!
October 25, 2016 at 9:43 pm
It's by design and definition. Referring to the note at the following URL...
[font="Arial Black"]OVER Clause (Transact-SQL)[/font]
NOTE:
If ORDER BY is not specified entire partition is used for a window frame. This applies only to functions that do not require ORDER BY clause. If ROWS/RANGE is not specified but ORDER BY is specified, RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for window frame. This applies only to functions that have can accept optional ROWS/RANGE specification. For example, ranking functions cannot accept ROWS/RANGE, therefore this window frame is not applied even though ORDER BY is present and ROWS/RANGE is not.
In other words, if you specify an ORDER BY for a function that CAN use a ROWS/RANGE specification, then the ORDER BY implicitly invokes a RANGE specification.
As a bit of a sidebar, it would have been nice if they had it default to a ROWS specification because it's faster than the RANGE specification.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 26, 2016 at 7:10 am
+1 on making ROWS default... reading about it last night actually had me wondering why they chose RANGE as default if ROWS is faster. (could it be because RANGE includes ties and maybe they don't want people to miss info w/o realizing it? :unsure: )
So, I guess the answer to my question, then, is 'because that's how they made it'. Fair enough! haha
October 26, 2016 at 8:27 am
scarr030 (10/26/2016)
So, I guess the answer to my question, then, is 'because that's how they made it'. Fair enough! haha
Heh... unfortunately, that's the answer to a great many mysteries in SQL Server. It's why (for example) that FORMAT is 44 times slower than CONVERT, shrinking a database fragments the hell out of it instead of packing it Peter Norton style, REORGANIZEing a database does things right but blows out the log file, REBUILDing a Clustered Index doesn't repack blobs, the new string splitting function misses elements and provides no index position of the elements, etc, etc, etc. 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
October 26, 2016 at 8:52 am
Jeff Moden (10/25/2016)
As a bit of a sidebar, it would have been nice if they had it default to a ROWS specification because it's faster than the RANGE specification.
The reason that they used RANGE instead of ROWS is that RANGE is deterministic whereas ROWS is not when there are ties in the order by clause.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
October 26, 2016 at 9:23 am
October 26, 2016 at 9:36 am
drew.allen (10/26/2016)
Jeff Moden (10/25/2016)
As a bit of a sidebar, it would have been nice if they had it default to a ROWS specification because it's faster than the RANGE specification.The reason that they used RANGE instead of ROWS is that RANGE is deterministic whereas ROWS is not when there are ties in the order by clause.
Drew
Curious how you might know why MS decided that. Is there a bit of MS documentation that states that's why they chose RANGE for the default?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 26, 2016 at 1:43 pm
Jeff Moden (10/26/2016)
drew.allen (10/26/2016)
Jeff Moden (10/25/2016)
As a bit of a sidebar, it would have been nice if they had it default to a ROWS specification because it's faster than the RANGE specification.The reason that they used RANGE instead of ROWS is that RANGE is deterministic whereas ROWS is not when there are ties in the order by clause.
Drew
Curious how you might know why MS decided that. Is there a bit of MS documentation that states that's why they chose RANGE for the default?
I honestly don't remember where I learned this. I believe it was on a request that someone submitted to change the default and someone from MS responded with the reason why they chose RANGE in the first place. It certainly makes a great deal of sense.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply