One of the things that I’m the most excited about in “Denali” CTP3 is the enhancements to the OVER clause. One of the things that I’m particular excited about is the addition of the ROWS/RANGE clause support. And one of the reasons that I’m excited about this is that this provides a supported, documented (in BOL) way to perform running totals.
My current favorite method of performing running totals is using the so-called “Quirky Update” / “Pseudo-Cursor” method. However, there is plenty of controversy surrounding the use of this method, primarily centered around the need for following a list of rules – rules that are not documented in BOL. (Please see this article for how to perform running totals using the Quirky Update method, and for the rules necessary to achieve the proper results.)
Now that running totals are supported in “Denali” CTP3, I decided to test out the various methods of doing so. Using the environment setup and test code from the above article, I added in the various methods of doing running totals available in “Denali” CTP3. (Code reprinted with permission of Jeff Moden.)
/*
/*************************************************************************************
Create the test table with a non-clustered Primary Key and a separate clustered index
This code has been tested in SQL Server 2000 and 2005.
*************************************************************************************/
--===== Do this testing in a nice, "safe" place that everyone has
USE TempDB
GO
--===== If the test table already exists, drop it in case we need to rerun.
-- The 3 part naming is overkill, but prevents accidents on real tables.
IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL
DROP TABLE TempDB.dbo.TransactionDetail
GO
--===== Create the test table (TransactionDetail) with a NON clustered PK
CREATE TABLE dbo.TransactionDetail
(
TransactionDetailID INT IDENTITY(1,1), --Temporal "tie-breaker"
Date DATETIME,
AccountID INT,
Amount MONEY,
AccountRunningTotal MONEY, --Running total across each account
AccountRunningCount INT, --Like "Rank" across each account
NCID INT, --For "proof" later in the article
CONSTRAINT PK_TransactionDetail_TransactionDetailID
PRIMARY KEY NONCLUSTERED (TransactionDetailID)
WITH FILLFACTOR = 100
)
GO
--===== Add a clustered index that will easily cause page splits because
-- of the randomized data being inserted. This index also represents
-- the expected sort order of most of the code examples.
CREATE CLUSTERED INDEX IXC_Transaction_AccountID_Date_TransactionDetailID
ON dbo.TransactionDetail (AccountID, Date, TransactionDetailID)
--===== Add a non-clustered index on the NCID column sorted in
-- descending order. This is for some "proofs" later on
-- in the article.
CREATE NONCLUSTERED INDEX IX_Transaction_NCID
ON dbo.TransactionDetail (NCID DESC)
GO
/*************************************************************************************
Populate the table using a rather slow method but one that's sure to cause lots of
Page splits and that will fragment the table with over 99% fragmentation.
*************************************************************************************/
--===== Preset the environment for appearance and speed
SET NOCOUNT ON
--===== Populate the table in "segments" to force page splits.
-- Normally this would NOT have a While loop in it.
-- Because the While loop is there and page splits are happening,
-- this takes a whopping 00:02:50 to create on my box.
WHILE (ISNULL(IDENT_CURRENT('TransactionDetail'),0)) < 1000000
BEGIN
INSERT INTO dbo.TransactionDetail
(Date, AccountID, Amount)
SELECT TOP 10000
--10 years worth of dates with times from 1/1/2000 to 12/31/2009
CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,
--100 different account numbers
ABS(CHECKSUM(NEWID()))%100+1,
--Dollar amounts from -99.99 to + 99.99
CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY)
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2
END
--===== Update the NCID column to be the reverse of the TransactionDetailID column
UPDATE dbo.TransactionDetail
SET NCID = 1000000 - TransactionDetailID + 1
GO
USE TEMPDB
GO
CREATE PROCEDURE dbo.Verify AS
/*************************************************************************************
Code to verify that the account running total calculation worked correctly.
Please read the comments to see how it works.
*************************************************************************************/
--===== Conditionally drop the verification table to make
-- it easy to rerun the verification code
IF OBJECT_ID('TempDB..#Verification') IS NOT NULL
DROP TABLE dbo.#Verification
--===== Define a variable to remember the number of rows
-- copied to the verification table
DECLARE @MyCount INT
--===== Copy the data from the test table into the
-- verification table in the correct order.
-- Remember the correct order with an IDENTITY.
SELECT IDENTITY(INT,1,1) AS RowNum,
AccountID,
Amount,
AccountRunningTotal
INTO #Verification
FROM dbo.TransactionDetail
ORDER BY AccountID, Date, TransactionDetailID
--===== Remember the number of rows we just copied
SELECT @MyCount = @@ROWCOUNT
--===== Check the running total calculations
SELECT CASE
WHEN COUNT(hi.RowNum) + 1 = @MyCount
THEN 'Account Running Total Calculations are correct'
ELSE 'There are some errors in the Account Running Totals'
END
FROM #Verification lo
INNER JOIN
#Verification hi
ON lo.RowNum + 1 = hi.RowNum
WHERE (-- Compare lines with the same AccountID
hi.AccountID = lo.AccountID
AND hi.AccountRunningTotal = lo.AccountRunningTotal + hi.Amount)
OR
(-- First line of account has running total same as amount
hi.AccountID <> lo.AccountID
AND hi.AccountRunningTotal = hi.Amount)
GO
/**************************************************************************************
This stored procedure will clear the calculated columns in the test table without
disturbing the randomized data in the table so that we can repeat tests and use
different methods without changing the test data.
**************************************************************************************/
USE TEMPDB
GO
CREATE PROCEDURE dbo.ResetTestTable AS
UPDATE dbo.TransactionDetail
SET AccountRunningTotal = NULL,
AccountRunningCount = NULL
GO
*/
/*************************************************************************************
Pseduo-cursor update using the "Quirky Update" to calculate both Running Totals and
a Running Count that start over for each AccountID.
Takes 24 seconds with the INDEX(0) hint and 6 seconds without it on my box.
*************************************************************************************/
--===== Supress the auto-display of rowcounts for speed an appearance
SET NOCOUNT ON
-- Quirky Update running totals
--===== Declare the working variables
DECLARE @PrevAccountID INT
DECLARE @AccountRunningTotal MONEY
DECLARE @AccountRunningCount INT
--===== Update the running total and running count for this row using the "Quirky
-- Update" and a "Pseudo-cursor". The order of the UPDATE is controlled by the
-- order of the clustered index.
UPDATE dbo.TransactionDetail
SET @AccountRunningTotal = AccountRunningTotal = CASE
WHEN AccountID = @PrevAccountID
THEN @AccountRunningTotal + Amount
ELSE Amount
END,
@AccountRunningCount = AccountRunningCount = CASE
WHEN AccountID = @PrevAccountID
THEN @AccountRunningCount + 1
ELSE 1
END,
@PrevAccountID = AccountID
FROM dbo.TransactionDetail WITH (TABLOCKX)
OPTION (MAXDOP 1)
GO
EXEC dbo.Verify
GO
EXEC dbo.ResetTestTable
GO
-- DENALI CTP3 Running Totals using DEFAULT RANGE option
;
WITH cte AS
(
SELECT AccountRunningTotal,
AccountRunningCount,
ART = SUM(Amount) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID),
ARC = SUM(1) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID)
FROM dbo.TransactionDetail
)
UPDATE cte
SET AccountRunningTotal = ART,
AccountRunningCount = ARC
GO
EXEC dbo.Verify
GO
EXEC dbo.ResetTestTable
GO
-- DENALI CTP3 Running Totals subquery using DEFAULT RANGE option
-- this is the same as the cte method
;
UPDATE target
SET AccountRunningTotal = ART,
AccountRunningCount = ARC
FROM (SELECT AccountRunningTotal,
AccountRunningCount,
ART = SUM(Amount) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID),
ARC = SUM(1) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID)
FROM dbo.TransactionDetail
) target
GO
EXEC dbo.Verify
GO
EXEC dbo.ResetTestTable
GO
-- DENALI CTP3 Running Totals with self-join using DEFAULT RANGE option.
;
WITH cte AS
(
SELECT TransactionDetailID,
[Date],
AccountID,
Amount,
AccountRunningTotal = SUM(Amount) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID),
AccountRunningCount = SUM(1) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID),
NCID
FROM dbo.TransactionDetail
)
UPDATE td
SET AccountRunningTotal = cte.AccountRunningTotal,
AccountRunningCount = cte.AccountRunningCount
FROM dbo.TransactionDetail td
JOIN cte
ON cte.TransactionDetailID = td.TransactionDetailID
GO
EXEC dbo.Verify
GO
EXEC dbo.ResetTestTable
GO
-- DENALI CTP3 Running Totals using ROWS UNBOUNDED PRECEDING
;
WITH cte AS
(
SELECT AccountRunningTotal,
AccountRunningCount,
ART = SUM(Amount) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
ROWS UNBOUNDED PRECEDING),
ARC = SUM(1) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
ROWS UNBOUNDED PRECEDING)
FROM dbo.TransactionDetail
)
UPDATE cte
SET AccountRunningTotal = ART,
AccountRunningCount = ARC
GO
EXEC dbo.Verify
GO
EXEC dbo.ResetTestTable
GO
-- DENALI CTP3 Running Totals using RANGE UNBOUNDED PRECEDING
;
WITH cte AS
(
SELECT AccountRunningTotal,
AccountRunningCount,
ART = SUM(Amount) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
RANGE UNBOUNDED PRECEDING),
ARC = SUM(1) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
RANGE UNBOUNDED PRECEDING)
FROM dbo.TransactionDetail
)
UPDATE cte
SET AccountRunningTotal = ART,
AccountRunningCount = ARC
GO
EXEC dbo.Verify
GO
EXEC dbo.ResetTestTable
GO
-- DENALI CTP3 Running Totals using ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
;
WITH cte AS
(
SELECT AccountRunningTotal,
AccountRunningCount,
ART = SUM(Amount) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
ARC = SUM(1) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM dbo.TransactionDetail
)
UPDATE cte
SET AccountRunningTotal = ART,
AccountRunningCount = ARC
GO
EXEC dbo.Verify
GO
EXEC dbo.ResetTestTable
GO
-- DENALI CTP3 Running Totals using RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
;
WITH cte AS
(
SELECT AccountRunningTotal,
AccountRunningCount,
ART = SUM(Amount) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
ARC = SUM(1) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM dbo.TransactionDetail
)
UPDATE cte
SET AccountRunningTotal = ART,
AccountRunningCount = ARC
GO
EXEC dbo.Verify
GO
EXEC dbo.ResetTestTable
GO
-- DENALI CTP3 Running Totals using ROWS BETWEEN 2147483647 PRECEDING AND CURRENT ROW
;
WITH cte AS
(
SELECT AccountRunningTotal,
AccountRunningCount,
ART = SUM(Amount) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
ROWS BETWEEN 2147483647 PRECEDING AND CURRENT ROW),
ARC = SUM(1) OVER (PARTITION BY AccountID
ORDER BY Date, TransactionDetailID
ROWS BETWEEN 2147483647 PRECEDING AND CURRENT ROW)
FROM dbo.TransactionDetail
)
UPDATE cte
SET AccountRunningTotal = ART,
AccountRunningCount = ARC
GO
EXEC dbo.Verify
GO
And the results, as shown from Profiler, are:
(Click image to view the full Profiler screen.)
When performing the running totals with the new OVER clause enhancements in “Denali”, it has 3 million to 18 million reads, vs. 8 thousand for the Quirky Update. This difference is reflected in the time each takes: the OVER clause enhancements are taking 22 – 75 seconds to perform the running totals, while the Quirky Update performs it in 3.5 seconds.
Let’s hope that all of this is due to it being in a beta state, and that it will improve when it reaches RTM. When it does reach the RTM status, I’ll test again to decide which method I’ll use. My impression right now is that it looks like I’ll be keeping the Quirky Update in my toolbag for a long time when performing running totals.