I ran across an article that showed calculating a running total with a twist. In this case, the total is the sum of the previous 5 rows. I decided this could be done more efficiently by converting the code to a more modern, window function approach. This article covers the code change.
Note: this assumes you are on a version of SQL Server that is 2012 or later. For this article, I am writing and testing this code on SQL Server 2019.
The Scenario
As the article shows, we want to totals of the 5 previous rows, based on the TransactionDate.
As you can see in the image above, if there aren't 5 rows, we return 0. If there are at least 4 preceding rows, then we sum up the values in Balance for those rows, add the current row, and then output the sum. For rows 1-5 (SSMS row in the result set), we have 100, 101, 102, 103, 104. If we add those, we have 510.
The next set of rows is: 101, 102, 103, 104, 105. If we add these, that's 515. We can continue on, and you can see that we are making a calculation of rising windows. That alone ought to clue you in to the way to handle this problem.
The Original Solution
The original solution used a cross join with a left join to produce the 0s for 4 rows and then the running totals. This is the code from the article.
SELECT Acc.ID,CONVERT(varchar(50),TransactionDate,101) AS TransactionDate , Balance, isnull(RunningTotal,'') AS RunningTotal FROM Accounts Acc LEFT OUTER JOIN (SELECT ID,sum(Balance) AS RunningTotal FROM (SELECT A.ID AS ID,B.ID AS BID, B.Balance FROM Accounts A cross JOIN Accounts B WHERE B.ID BETWEEN A.ID-4 AND A.ID AND A.ID>4 )T GROUP BY ID ) Bal ON Acc.ID=Bal.ID
This worked, but it can work better.
Creating a Test
I like to start by creating a test to get the results into a repeatable method for testing an improvement. I like to call stored procedures, as my refactoring of the proc is easy and I don't change my test. So I'll start by creating this test in tsqlt. If you've never used tsqlt, this article might help.
First, let's get our query into a procedure.
CREATE PROCEDURE dbo.GetLastFiveAccountSum AS SELECT Acc.ID,CONVERT(varchar(50),TransactionDate,101) AS TransactionDate , Balance, isnull(RunningTotal,'') AS RunningTotal FROM Accounts Acc LEFT OUTER JOIN (SELECT ID,sum(Balance) AS RunningTotal FROM (SELECT A.ID AS ID,B.ID AS BID, B.Balance FROM Accounts A cross JOIN Accounts B WHERE B.ID BETWEEN A.ID-4 AND A.ID AND A.ID>4 )T GROUP BY ID ) Bal ON Acc.ID=Bal.ID GO
I'll create a test class for the tests and then add my test. In this case, I'm just calling my proc and looking for the results from my query.
EXEC tsqlt.NewTestClass @ClassName = N'windowtests' -- nvarchar(max) GO CREATE OR ALTER PROCEDURE [windowtests].[test last 5 row running total for accounts] AS BEGIN -------------------------------------------- ----- Assemble -------------------------------------------- EXEC tsqlt.FakeTable @TableName = N'Accounts' -- nvarchar(max) , @SchemaName = N'dbo' -- nvarchar(max) insert into Accounts(ID, TransactionDate,Balance) VALUES (1, '1/1/2000',100), (2, '1/2/2000',101), (3, '1/3/2000',102), (4, '1/4/2000',103), (5, '1/5/2000',104), (6, '1/6/2000',105), (7, '1/7/2000',106), (8, '1/8/2000',107), (9, '1/9/2000',108), (10, '1/10/2000',109), (11, '1/11/2000',200), (12, '1/12/2000',201), (13, '1/13/2000',202), (14, '1/14/2000',203), (15, '1/15/2000',204), (16, '1/16/2000',205), (17, '1/17/2000',206), (18, '1/18/2000',207), (19, '1/19/2000',208), (20, '1/20/2000',209) CREATE TABLE #expected (ID INT, TransactionDateDATE, Balance INT, Runningtotal INT) INSERT #expected (ID, TransactionDate, Balance, Runningtotal) VALUES (1, '01/01/2000', 100, 0), (2,'2000-01-02 00:00:00.000',101,0), (3,'2000-01-03 00:00:00.000',102,0) , (4,'2000-01-04 00:00:00.000',103,0) , (5,'2000-01-05 00:00:00.000',104,510) , (6,'2000-01-06 00:00:00.000',105,515) , (7,'2000-01-07 00:00:00.000',106,520) , (8,'2000-01-08 00:00:00.000',107,525) , (9,'2000-01-09 00:00:00.000',108,530) , (10,'2000-01-10 00:00:00.000',109,535) , (11,'2000-01-11 00:00:00.000',200,630) , (12,'2000-01-12 00:00:00.000',201,725) , (13,'2000-01-13 00:00:00.000',202,820) , (14,'2000-01-14 00:00:00.000',203,915) , (15,'2000-01-15 00:00:00.000',204,1010) , (16,'2000-01-16 00:00:00.000',205,1015) , (17,'2000-01-17 00:00:00.000',206,1020) , (18,'2000-01-18 00:00:00.000',207,1025) , (19,'2000-01-19 00:00:00.000',208,1030) , (20,'2000-01-20 00:00:00.000',209,1035) SELECT * INTO #actual FROM #expected WHERE 1 = 0 -------------------------------------------- ----- Act -------------------------------------------- INSERT #actual EXEC dbo.GetLastFiveAccountSum -------------------------------------------- ----- Assert -------------------------------------------- EXEC tsqlt.AssertEqualsTable @Expected = N'#expected' -- nvarchar(max) , @Actual = N'#actual' -- nvarchar(max) , @Message = N'It works!' -- nvarchar(max) , @FailMsg = N'You Failed. Try again' -- nvarchar(max) END GO EXEC tSQLt.run '[windowtests].[test last 5 row running total for accounts]'
If I run the test (bottom of the code above), it works.
Now that we have a baseline of our existing code, let's refactor things.
Using Window Functions
We have a few articles on window functions here at SQL Server Central, though we can always use more, especially for specific situations.
In this case, we want a running total, which is easy with window functions. Let's try to structure a query using SUM and an OVER clause that orders data by the transaction date. With that in mind, I'd have this code:
CREATE OR ALTER PROCEDURE dbo.GetLastFiveAccountSum AS SELECT id , TransactionDate , Balance , SUM (Balance) OVER (ORDER BY TransactionDate) AS runningotal FROM dbo.accounts; GO EXEC tSQLt.run '[windowtests].[test last 5 row running total for accounts]'
Note that I've changed the code in my proc, and then I call my test. When I do this, things fail. I get these results (truncated in the image). Note that the first column shows me where the result comes from. There is only one =, which is the one row that matches. This is the 5th row. That's the only correct one.
What I've done is build a constant running total of all rows.
Let's try again. We will now look at limiting our window, and we'll do that with the ROWS clause. In this case, we'll limit the results to the current row and the preceding 4 rows. That gives us this code:
CREATE OR ALTER PROCEDURE dbo.GetLastFiveAccountSum AS SELECT id , TransactionDate , Balance , SUM (Balance) OVER (ORDER BY TransactionDate ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) AS runningotal FROM dbo.accounts; GO EXEC tSQLt.run '[windowtests].[test last 5 row running total for accounts]'
Now the results look better, but still not great.
Now I see that there are a number of correct rows. In fact, all are correct, except for the first 4 rows. I want zeros here, when there aren't 4 preceding rows.
Fortunately, I have another window function that can help: LAG(). The LAG() function looks at the rows that come before the current row. I can specify how far I want to look back. In this case, I'll always look back 4 rows and if there isn't a 4th row to look at, we return a NULL. Then I'll use a CASE to check for a NULL, and if so, return a zero. When I change my code to this, it works.
SELECT id , TransactionDate , Balance , CASE WHEN LAG(TransactionDate, 4, null) OVER (ORDER BY TransactionDate) IS NOT NULL THEN SUM (Balance) OVER (ORDER BY TransactionDate ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) ELSE 0 END AS runningotal FROM dbo.accounts
Here are the test results.
Hurrah. Now, I need to save this code in a VCS and have someone review it, approve it, and ship it.
Performance
You might be wondering about performance. I won't do an extensive performance test, but I'll add one on my blog. However, a quick test shows this:
CPU time | Logical Reads | |
Old Code | 0ms | 37 |
New Code | 0ms | 1 |
While the data set is too low to show CPU time, there is a discrepancy at larger data sizes. The logical reads are much different and this difference will grow as the data size grows. You should always test your code at larger sizes, and I have a blog that shows a better comparison.
Summary
This article looked at an old-style SQL cross join to calculate a limited running total and then refactored that code to use a window function. A tsqlt test was written and used to verify that the code still performed as expected.
Limited performance evaluation was given, but the window function is clearly more performant than the old SQL. If you decide to use this type of code, please load test at scale to verify your code works appropriately. This article can help you understand how to test and evaluate your code.