Often I see running totals that are written in SQL using a variety of techniques. Many pieces of code were written in pre-2012 techniques, prior to window functions being introduced.
After SQL Server 2012, we had better ways to write a total. In this case, let’s see how much better. This is based on an article showing how you might convert code from the first query to the second. This is a performance analysis of the two techniques are different scales..
Pre SQL Server 2012
The old way:
SELECT Acc.ID,CONVERT(varchar(50),TransactionDate,101) AS TransactionDate , Balance, isnull(RunningTotal,'') AS RunningTotal FROM Accounts Acc LEFT OUTER JOIN (SELECT ID,sum(Balance) AS RunningTotal FROM (SELECT A.ID AS ID,B.ID AS BID, B.Balance FROM Accounts A cross JOIN Accounts B WHERE B.ID BETWEEN A.ID-4 AND A.ID AND A.ID>4 )T GROUP BY ID ) Bal ON Acc.ID=Bal.ID
What were the statistics on this? After running a few times, with STATISTICS IO ON, I get this:
Table ‘Accounts’. Scan count 37, logical reads 37, physical reads 0
Not bad. I’ve truncated out the other values as they were all 0.
Window Functions
Here is the same query written with a Window function.
SELECT
id
, TransactionDate
, Balance
, CASE WHEN LAG(TransactionDate, 4, null) OVER (ORDER BY TransactionDate) IS NOT NULL
THEN SUM (Balance) OVER (ORDER BY TransactionDate ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)
ELSE 0
END AS runningotal
FROM dbo.accounts
The statistics?
Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0
Table ‘Accounts’. Scan count 1, logical reads 1, physical reads 0
The window function definitely does less work. A lot less. But how does this scale?
Performance Testing
There are numerous ways to create some test data for this. Since I have Redgate SQL Data Generator, I decided to use that. It’s simple and easy, and I added 100,000 rows first.
My results of the first query:
Lots of reads and scans. Let’s compare this to the window function.
Hmmm, both took essentially zero time less than a second. That might lead some developers to think either method is quick enough.
Let’s add 1mm more rows.
Now compare. The first takes about 15s with these results.
The window function? 3 sec, with these stats.
The comparison looks like this. First, let’s look at SSMS time
Old, Cross Join | Window Function | |
20 rows | 0 sec | 0 sec |
100,020 rows | 0 sec | 0 sec |
1,100,020 rows | 15 sec | 3 sec |
If we look at CPU time, then we see this:
Old, Cross Join | Window Function | |
20 rows | 0 ms | 0 ms |
100,020 rows | 748 ms | 250 ms |
1,100,020 rows | 5845 ms | 2919 ms |
If we look at the logical reads in total, we see this
Old, Cross Join | Window Function | |
20 rows | 37 | 1 |
100,020 rows | 403,998 | 359 |
1,100,020 rows | 6,450,895 | 3945 |
Clearly the window function is better and the better grows as the size of data grows.
Summary
This post looks at two queries and compares the performance across a few queries. These aren’t the only ones, and you might choose other types of queries, but these are both examples of how you might approach a problem using old tech and new tech.
The window function is not slightly more efficient, but extremely efficient compared to the older style method of using a cross join. As the data scales up, the difference is pronounced. While 1mm rows might not be a great test here, and you may prefer to test at 10mm or 100mm rows to get an idea of load, the fact is the Window function is much quicker and uses less resources.
If you are using older style code to perform T-SQL calculations, make some time to refactor that code (and test it) to use modern window functions.
Setup Code
Here’s the initial setup code:
CREATE TABLE Accounts ( ID int IDENTITY(1,1), TransactionDate datetime, Balance float ) go insert into Accounts(TransactionDate,Balance) values ('1/1/2000',100) insert into Accounts(TransactionDate,Balance) values ('1/2/2000',101) insert into Accounts(TransactionDate,Balance) values ('1/3/2000',102) insert into Accounts(TransactionDate,Balance) values ('1/4/2000',103) insert into Accounts(TransactionDate,Balance) values ('1/5/2000',104) insert into Accounts(TransactionDate,Balance) values ('1/6/2000',105) insert into Accounts(TransactionDate,Balance) values ('1/7/2000',106) insert into Accounts(TransactionDate,Balance) values ('1/8/2000',107) insert into Accounts(TransactionDate,Balance) values ('1/9/2000',108) insert into Accounts(TransactionDate,Balance) values ('1/10/2000',109) insert into Accounts(TransactionDate,Balance) values ('1/11/2000',200) insert into Accounts(TransactionDate,Balance) values ('1/12/2000',201) insert into Accounts(TransactionDate,Balance) values ('1/13/2000',202) insert into Accounts(TransactionDate,Balance) values ('1/14/2000',203) insert into Accounts(TransactionDate,Balance) values ('1/15/2000',204) insert into Accounts(TransactionDate,Balance) values ('1/16/2000',205) insert into Accounts(TransactionDate,Balance) values ('1/17/2000',206) insert into Accounts(TransactionDate,Balance) values ('1/18/2000',207) insert into Accounts(TransactionDate,Balance) values ('1/19/2000',208) insert into Accounts(TransactionDate,Balance) values ('1/20/2000',209) go