December 30, 2020 at 10:03 pm
When creating a simple query to compute a running sum/total for each row using a window function [SUM() OVER(...)], I noticed that I forgot to include a clause to limit the frame from the first row to the current row.
Nonetheless, the query returned the correctly calculated running total for each row, and I need to figure out:
Was this just a fluke? Or is this always true? Is this behavior/outcome dependable? (I'll revise production code regardless, unless it's emphatically clear this is intended behavior; I would like to know, though.)
Here's a complete script to run. It returns 2 computed columns based on a windowed SUM() -- 1 column with and 1 without the "ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" clause.
DROP TABLE IF EXISTS dbo.running_total;
CREATE TABLE running_total
(
MeasurementDateTime DATETIME,
MeasurementPoint VARCHAR(12),
MeasuredValue DECIMAL(3,1)
);
INSERT INTO running_total
(MeasurementDateTime, MeasurementPoint, MeasuredValue)
VALUES
('2020-12-01', 'A1', 2.0),
('2020-12-02', 'A1', 3.0),
('2020-12-03', 'A1', 4.3),
('2020-11-01', 'A1', 2.0),
('2020-11-02', 'A1', 3.0),
('2020-11-03', 'A1', 4.3),
('2020-12-01', 'B1', 12.0),
('2020-12-02', 'B1', 13.0),
('2020-12-03', 'B1', 14.3)
;
SELECT
MeasurementPoint, MeasurementDateTime, MeasuredValue,
--Running sum, WITHOUT using the ROWS BETWEEN UNBOUNDED PRECEDING clause
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime) As RunSum_Without_Unbounded,
--... and running sum, WITH that clause
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) As RunSum_WITH_Unbounded
FROM
running_total
ORDER BY
MeasurementPoint, MeasurementDateTime;
When I run this (SQL Server 2019 Developer Edition), I get back identical results in the two computed columns.
Thanks!
Rich
December 30, 2020 at 10:21 pm
From BOL:
ROWS/RANGE that limits the rows within the partition by specifying start and end points within the partition. It requires ORDER BY argument and the default value is from the start of partition to the current element if the ORDER BY argument is specified.
You can review the documentation here: https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15
This is the default behavior since you specified an ORDER BY.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
December 30, 2020 at 10:24 pm
Jeffrey thank you and a sheepish apology for my not checking BOL first!
🙂
That wording is unusually clear and definitive, isn't it?
Happy New Year!
Rich
January 4, 2021 at 10:01 am
This was removed by the editor as SPAM
January 4, 2021 at 4:11 pm
There may be a difference. If you do not specify ROWS / RANGE BETWEEN ... the default is RANGE BETWEEN. With your definition, it is possible to have duplicates for the columns MeasurementDateTime, MeasurementPoint and it shows the difference between ROWS and RANGE. Rows are the series currently being processed. RANGE is all rows with the same value in the columns MeasurementDateTime, MeasurementPoint, as the row currently being processed. The example shows the difference. If you only want one row in the result for each value of MeasurementDateTime, MeasurementPoint, the last statement must be used. RANGE is an unfortunate default - but it is chosen by MS!
DROP TABLE IF EXISTS dbo.running_total;
CREATE TABLE dbo.running_total
(
MeasurementDateTimeDATETIME,
MeasurementPointVARCHAR(12),
MeasuredValueDECIMAL(3,1)
);
INSERT INTO dbo.running_total
(MeasurementDateTime, MeasurementPoint, MeasuredValue) VALUES
('2020-12-01', 'A1', 2.0),
('2020-12-02', 'A1', 3.0),
('2020-12-03', 'A1', 4.3),
('2020-11-01', 'A1', 2.0),
('2020-11-02', 'A1', 3.0),
('2020-11-03', 'A1', 4.3),
('2020-12-01', 'B1', 12.0),
('2020-12-02', 'B1', 13.0),
('2020-12-03', 'B1', 14.3);
SELECT
MeasurementPoint, MeasurementDateTime, MeasuredValue,
--Running sum, WITHOUT using the ROWS BETWEEN UNBOUNDED PRECEDING clause
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime) AS RunSum_Without_Unbounded,
--... and running sum, WITH ROWS BETWEEN
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RowsUnbounded,
--... and running sum, WITH RANGE BETWEEN
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RangeUnbounded
FROM
dbo.running_total
ORDER BY
MeasurementPoint, MeasurementDateTime;
GO
-- allowed in relation to definitions
INSERT INTO dbo.running_total
(MeasurementDateTime, MeasurementPoint, MeasuredValue) VALUES
('2020-11-01', 'A1', 9.0),
('2020-11-01', 'A1', 19.0),
('2020-11-01', 'A1', 29.0);
GO
SELECT
MeasurementPoint, MeasurementDateTime, MeasuredValue,
--Running sum, WITHOUT using the ROWS BETWEEN UNBOUNDED PRECEDING clause
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime) AS RunSum_Without_Unbounded,
--... and running sum, WITH ROWS BETWEEN
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RowsUnbounded,
--... and running sum, WITH RANGE BETWEEN
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RangeUnbounded
FROM
dbo.running_total
ORDER BY
MeasurementPoint, MeasurementDateTime;
GO
SELECT
MeasurementPoint, MeasurementDateTime, MeasuredValue,
--Running sum, WITHOUT using the ROWS BETWEEN UNBOUNDED PRECEDING clause
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime) AS RunSum_Without_Unbounded,
--... and running sum, WITH ROWS BETWEEN
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RowsUnbounded,
--... and running sum, WITH RANGE BETWEEN
SUM(MeasuredValue) OVER(PARTITION BY MeasurementPoint ORDER BY MeasurementDateTime RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunSum_WITH_RangeUnbounded
FROM
(SELECT MeasurementDateTime, MeasurementPoint, SUM(MeasuredValue) AS MeasuredValue
FROM dbo.running_total
GROUP BY MeasurementDateTime, MeasurementPoint) AS running_total
ORDER BY
MeasurementPoint, MeasurementDateTime;
January 4, 2021 at 7:10 pm
That's a great explanation and amplification of the query, thank you. I made a "mistake" in my simplified example, as my original project table does in fact have a PK on MeasurementDateTime and MeasurementPoint -- no multiple rows per point/datetime are allowed. I didn't include a PK here.
But I'm glad I didn't include the PK in my code posted above, b/c your answer underscores a difference between ROWS and RANGE that I hadn't fully considered or appreciated. It won't matter for my project (b/c of the PK), but your examples might prevent me from making an error in the future that I might otherwise have missed.
Thank you,
Rich
February 24, 2021 at 4:48 am
This was removed by the editor as SPAM
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply