The Problem: A legacy table contains amounts and a char column indicating whether the amount is a credit or a debit. We want an aggregate (over a given range) of the amounts. We need to add the credits and subtract the debits.
The Solution(s): Two possible approaches to the problem suggest themselves. One is to sum all of the credits and all the debits and them subtract the total debit from the total credit. The second approach is to include a case statement inside the aggregate function to add or subtract the given row’s amount based on the credit/debit column.
The Code: Implement a cut down version of the table with some contrived sample data:
create table tbl_ledger ( id int identity(1,1), credit_debit char(1), amount money, [description] varchar(128) ) insert tbl_ledger(credit_debit,amount,[description]) values ('C',10,'Payment'), ('D',10,'Refund'), ('C',10,'Payment'), ('C',10,'Payment'), ('D',10,'Refund'), ('C',10,'Payment'), ('C',10,'Payment'), ('D',10,'Refund'), ('C',10,'Payment'), ('C',10,'Payment')
This table has 10 rows, 7 are $10 credits and 3 are $10 debits so we would expect the sum to return $40 ($70 credit – $30 debit).
Implement option 1, subtract the aggregates (this could be done a number of ways, including using variables to hold interim results, or common table expressions. I’ve chosen a sub query as I think it reads well. The query plans for sub queries and cte’s are similar, using variables muddies the waters a bit):
select sum(amount) - (select sum(amount) as [debit] from tbl_ledger where credit_debit = 'D' ) as [credit] from tbl_ledger where credit_debit = 'C'
As hoped this returns 40.
Implement option 2 using a case statement inside the sum function:
select sum(case when credit_debit = 'C' then amount when credit_debit = 'D' then -1 * amount end) as [total cash] from tbl_ledger
This again returns a correct result.
The Analysis: To me option 2 is a more elegantly coded solution, but how does it stack up performance wise? To get some measurable results rerun the insert multiple times.
insert tbl_ledger(credit_debit,amount,[description]) values ('C',10,'Payment'), ('D',10,'Refund'), ('C',10,'Payment'), ('C',10,'Payment'), ('D',10,'Refund'), ('C',10,'Payment'), ('C',10,'Payment'), ('D',10,'Refund'), ('C',10,'Payment'), ('C',10,'Payment') go 100000
And run the two queries side by side with statistics on:
set statistics io on set statistics time on select sum(case when credit_debit = 'C' then amount when credit_debit = 'D' then -1 * amount end) as [total cash] from tbl_ledger select sum(amount) - (select sum(amount) as [debit] from tbl_ledger where credit_debit = 'D' ) as [credit] from tbl_ledger where credit_debit = 'C' set statistics io off set statistics time off
I got the following:
(1 row(s) affected)
Table ‘tbl_ledger’. Scan count 1, logical reads 4167, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 295 ms.
(1 row(s) affected)
Table ‘tbl_ledger’. Scan count 10, logical reads 8334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 483 ms, elapsed time = 130 ms.
With the execution plans:
The sub query outperforms the conditional aggregate primarily because the query can be parallelised. The conditional aggregate does a single table scan versus two table scans for the sub query, 4167 logical reads versus 8344 logical reads – but when I ran the tests against cold caches the read ahead mechanism pulled all data pages into cache for both queries so I didn’t notice a significant change in the comparative result.
Running the second query with a maxdop hint brings the results closer:
(1 row(s) affected)
Table ‘tbl_ledger’. Scan count 1, logical reads 4167, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 328 ms, elapsed time = 347 ms.
(1 row(s) affected)
Table ‘tbl_ledger’. Scan count 2, logical reads 8334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 249 ms, elapsed time = 296 ms.
And drops the parallel operators:
The Conclusion: After analysis it seems that the sub query out performs the conditional aggregate option – even if we suppress parallelism. That said these tests were on a cut down table, and a wider table with more rows might bring IO more into play – especially if indexes are added to the mix. Adding an index on the credit_debit column with amount as an included column speeds up the sub query even more – replacing the two table scans with index seeks.