April 25, 2015 at 12:52 pm
Comments posted to this topic are about the item Using a Window Aggregate in an Aggregate Query
Aunt Kathi Data Platform MVP
Author of Expert T-SQL Window Functions
Simple-Talk Editor
April 26, 2015 at 5:31 am
This was removed by the editor as SPAM
April 26, 2015 at 6:35 am
Stewart "Arturius" Campbell (4/26/2015)
Never thought of doing it this way
This is a very efficient method when aggregating with different grouping / level of details, comes close to the divide and conquer method in performance.
😎
April 26, 2015 at 11:53 am
If we assume, that we have the latest SQL Server Version (2014), then an over by is required according to BOL (I can't try it yet)
OVER ( [ partition_by_clause ] order_by_clause)
order_by_clause determines the logical order in which the operation is performed. order_by_clause is required.
Or BOL is incorrect?
April 27, 2015 at 1:13 am
palotaiarpad (4/26/2015)
If we assume, that we have the latest SQL Server Version (2014), then an over by is required according to BOL (I can't try it yet)OVER ( [ partition_by_clause ] order_by_clause)
order_by_clause determines the logical order in which the operation is performed. order_by_clause is required.
Or BOL is incorrect?
Link?
ORDER BY is not required, unless you use a frame extent.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
April 27, 2015 at 1:13 am
Very nice question btw.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
April 27, 2015 at 1:24 am
Link is already in my first post, but please: https://msdn.microsoft.com/en-us/library/ms187810.aspx
April 27, 2015 at 1:32 am
palotaiarpad (4/27/2015)
Link is already in my first post, but please: https://msdn.microsoft.com/en-us/library/ms187810.aspx
BOL is incorrect. At the bottom, there are already 2 remarks stating the same.
If you take a look at the OVER clause page where BOL links to, you can see it is explicitly stated that ORDER BY is not required.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
April 27, 2015 at 1:49 am
Koen Verbeeck (4/27/2015)
Very nice question btw.
Looks like BOL has an error:
😎
OVER ( [ partition_by_clause ] order_by_clause)
partition_by_clause divides the result set produced by the FROM clause into partitions to which the function is applied. If not specified, the function treats all rows of the query result set as a single group. order_by_clause determines the logical order in which the operation is performed. order_by_clause is required. For more information, see OVER Clause (Transact-SQL).
Another thing to keep in mind is the somewhat strange OVER defaults, recommend always filling those in rather than relying on the supplied defaults. Koen has a good blog post on this[/url]. Here is a quick sample:
SELECT TOP(10)
so.object_id
,SUM(so.object_id) OVER() AS EMPTY_OVER
,SUM(so.object_id) OVER
(
PARTITION BY so.object_id
) AS PART_BY_SELF
,SUM(so.object_id) OVER
(
ORDER BY so.object_id
) AS ORD_BY_SELF
FROM sys.objects so
WHERE so.object_id > 0
AND so.object_id < 100;
Results with TOP(10)
object_id EMPTY_OVER PART_BY_SELF ORD_BY_SELF
----------- ----------- ------------ -----------
3 3669 3 3
5 3669 5 8
6 3669 6 14
7 3669 7 21
8 3669 8 29
9 3669 9 38
17 3669 17 55
18 3669 18 73
19 3669 19 92
20 3669 20 112
Now the same code without the TOP(10)
SELECT
so.object_id
,SUM(so.object_id) OVER() AS EMPTY_OVER
,SUM(so.object_id) OVER
(
PARTITION BY so.object_id
) AS PART_BY_SELF
,SUM(so.object_id) OVER
(
ORDER BY so.object_id
) AS ORD_BY_SELF
FROM sys.objects so
WHERE so.object_id > 0
AND so.object_id < 100;
Results without TOP(10)
object_id EMPTY_OVER PART_BY_SELF ORD_BY_SELF
----------- ----------- ------------ -----------
3 3669 3 3
5 3669 5 8
6 3669 6 14
7 3669 7 21
8 3669 8 29
9 3669 9 38
17 3669 17 55
18 3669 18 73
19 3669 19 92
20 3669 20 112
21 3669 21 133
22 3669 22 155
23 3669 23 178
24 3669 24 202
25 3669 25 227
27 3669 27 254
28 3669 28 282
29 3669 29 311
34 3669 34 345
41 3669 41 386
42 3669 42 428
43 3669 43 471
44 3669 44 515
45 3669 45 560
46 3669 46 606
47 3669 47 653
48 3669 48 701
49 3669 49 750
50 3669 50 800
51 3669 51 851
54 3669 54 905
55 3669 55 960
56 3669 56 1016
57 3669 57 1073
58 3669 58 1131
59 3669 59 1190
60 3669 60 1250
62 3669 62 1312
63 3669 63 1375
64 3669 64 1439
65 3669 65 1504
67 3669 67 1571
68 3669 68 1639
69 3669 69 1708
71 3669 71 1779
72 3669 72 1851
73 3669 73 1924
74 3669 74 1998
75 3669 75 2073
78 3669 78 2151
79 3669 79 2230
80 3669 80 2310
82 3669 82 2392
84 3669 84 2476
85 3669 85 2561
86 3669 86 2647
87 3669 87 2734
89 3669 89 2823
90 3669 90 2913
91 3669 91 3004
92 3669 92 3096
93 3669 93 3189
94 3669 94 3283
95 3669 95 3378
96 3669 96 3474
97 3669 97 3571
98 3669 98 3669
April 27, 2015 at 2:55 am
palotaiarpad (4/26/2015)
Hmm. According to BOL, if OVER used with a SUM, then an order by clause is required...
As others already said, BOL is wrong.
OVER() with normal aggregate functions (like SUM) works in two possible ways:
1. Without ORDER BY - either with PARTITION BY, or just as OVER (). In this case, the PARTITION BY defines groups (no PARTITION BY means that the entire result set is considered a single group), and the aggregate is applied to all rows in the group.
2. With ORDER BY. In this case, a ROWS or RANGE specification should also be added, allthough the system does supply a default for the latter. With ORDER BY, the results are still grouped based on the PARTITION BY clause, but the ORDER BY + ROWS or RANGE specification limits the function of the aggregate to a subset of the group. This is very useful for, for instance, running totals.
When using ORDER BY, please do not rely on the default ROWS or RANGE specification. The default may appear to give you running totals, but it uses RANGE instead of ROWS; depending on whether the ORDER BY specification has duplicates or not, this either means that you sometimes get unexpected results, or it means that though you get the same results, you are thrashing performance.
April 27, 2015 at 3:02 am
Eirikur Eiriksson (4/27/2015)
Another thing to keep in mind is the somewhat strange OVER defaults, recommend always filling those in rather than relying on the supplied defaults. Koen has a good blog post on this[/url]. Here is a quick sample:
I agree that the default RANGE specification instead of ROWS specification is bad, but the demo code and results you posted appear not related to that. It appears that you expect a different behaviour by adding or removing a TOP clause. Can you please expand on that? (For the record, the results were exactly as I would expect them to be).
Also note that your query with TOP has no ORDER BY specification for the outer query. That means that SQL Server could have returned any subset of ten rows from the full results, in any desired order. The fact that the rows returned happen to match the ORDER BY specification in the query is an artefact of how the optimizer decides to execute the query, but that is not guaranteed.
April 27, 2015 at 5:19 am
Hugo Kornelis (4/27/2015)
Eirikur Eiriksson (4/27/2015)
Another thing to keep in mind is the somewhat strange OVER defaults, recommend always filling those in rather than relying on the supplied defaults. Koen has a good blog post on this[/url]. Here is a quick sample:I agree that the default RANGE specification instead of ROWS specification is bad, but the demo code and results you posted appear not related to that. It appears that you expect a different behaviour by adding or removing a TOP clause. Can you please expand on that? (For the record, the results were exactly as I would expect them to be).
The code is an simple illustration that the defaults for the empty OVER are different from other defaults implied when different options are used. This affects not only for RANGE/ROWS but also for the frame and the scope.
😎
Also note that your query with TOP has no ORDER BY specification for the outer query. That means that SQL Server could have returned any subset of ten rows from the full results, in any desired order. The fact that the rows returned happen to match the ORDER BY specification in the query is an artefact of how the optimizer decides to execute the query, but that is not guaranteed.
Even with the ORDER BY in the query, the results are the same.
April 27, 2015 at 5:39 am
palotaiarpad (4/26/2015)
If we assume, that we have the latest SQL Server Version (2014), then an over by is required according to BOL (I can't try it yet)
Others have shown BOL to be incorrect, so I'll just address the version question. If you remove the FORMAT function, you should be able to run the queries in SQL 2008.
Nice question. Thanks.
Viewing 15 posts - 1 through 15 (of 26 total)
You must be logged in to reply to this topic. Login to reply