February 26, 2010 at 6:54 pm
nathan 7372 (2/26/2010)
Currently there is no running balance column in the table. If I need to add one let me know. I also have no indexes at this point. I probably should add some but I'm not entirely sure what would be a useful index at this point.
Thanks for the quick feedback. Not to worry... let's peel one potato at a time. First, we'll get you the answer you need and then we'll work on your table. I'm almost done.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 26, 2010 at 7:22 pm
Ok... here we go. This method is commonly referred to as the "Quirky Update" method and it's appropriately named because if you don't follow all the rules to a "T", it'll get "quirky" on you and it'll come up with the wrong answers. If you do follow the rules, it's rock solid accurate and nasty fast.
There are some rules that won't be apparent in the code below. To wit, I'd really like you to take the time (it's 22 pages long in MS Word) to read the article at the following URL. Don't let the warnings scare you if you follow the rules. Sorry to sound brutal about this but I value your data as much as I do my own so if you think you're smarter than the rules and decide not to follow the rules, then don't use the "Quirky Update" method. Instead, use a well written FORWARD ONLY, READ ONLY, STATIC cursor which (not including a CLR) is the third fastest method (the second fastest is a bit more complicated) for doing running totals of this nature.
http://www.sqlservercentral.com/articles/T-SQL/68467/
Now, if you do decide to follow the rules, prepare to be absolutely amazed. The following code includes the test data you currently have in your original post as well as the code to do your running balances for all accounts in a Temp table. If it seems like it didn't actually run, that's the amazing part... shear blinding speed. 😉 Your 222k rows should take about 2 seconds to run and most of that is setting up the Temp Table and the required clustered index. The run time doesn't include the display time.
I should also mention that I shifted from FLOAT to 17 place decimal numbers. You'll find I didn't come up with the same total for the one account as you did because I'm running just a bit more accurate because less rounding is occuring (I think). I also split out the monthly interest so you can see how much it actually is. The "Balance" column is what you were referring to as the "Interest(Month)".
Here's the code:
--=====================================================================================================================
-- This is the original test data including the second account you added this evening
--=====================================================================================================================
IF OBJECT_ID('TempDB..#InterestRates','U') IS NOT NULL
DROP TABLE #InterestRates
;
CREATE TABLE #InterestRates
(
[InterestRate] DECIMAL(28,17) NULL,
[Month] [int] NULL,
[year] [int] NULL
)
;
INSERT INTO #InterestRates
(InterestRate,[Month],[Year])
SELECT '0.105','8','2007' UNION ALL
SELECT '0.105','9','2007' UNION ALL
SELECT '0.105','10','2007' UNION ALL
SELECT '0.105','11','2007' UNION ALL
SELECT '0.105','12','2007' UNION ALL
SELECT '0.105','1','2008' UNION ALL
SELECT '0.105','2','2008' UNION ALL
SELECT '0.105','3','2008' UNION ALL
SELECT '0.105','4','2008' UNION ALL
SELECT '0.105','5','2008' UNION ALL
SELECT '0.105','6','2008' UNION ALL
SELECT '0.105','7','2008' UNION ALL
SELECT '0.105','8','2008' UNION ALL
SELECT '0.105','9','2008' UNION ALL
SELECT '0.105','10','2008' UNION ALL
SELECT '0.105','11','2008' UNION ALL
SELECT '0.105','12','2008' UNION ALL
SELECT '0.105','1','2009' UNION ALL
SELECT '0.105','2','2009' UNION ALL
SELECT '0.105','3','2009' UNION ALL
SELECT '0.105','4','2009' UNION ALL
SELECT '0.105','5','2009' UNION ALL
SELECT '0.105','6','2009' UNION ALL
SELECT '0.0067','7','2009' UNION ALL
SELECT '0.0067','8','2009' UNION ALL
SELECT '0.0067','9','2009' UNION ALL
SELECT '0.0055','10','2009' UNION ALL
SELECT '0.0055','11','2009' UNION ALL
SELECT '0.0055','12','2009' UNION ALL
SELECT '0.0055','1','2010' UNION ALL
SELECT '0.0055','2','2010'
;
IF OBJECT_ID('TempDB..#Refunds','U') IS NOT NULL
DROP TABLE #Refunds
;
CREATE TABLE #Refunds
(
[Acct_no] varchar(20) NULL,
[Amount] DECIMAL(28,17) NULL,
[Month] [int] NULL,
[year] [int] NULL
)
;
INSERT INTO #Refunds
(Acct_no,Amount, [Month],[year])
SELECT '1271003600','333.107456586203','1','2008' UNION ALL
SELECT '1271003600','87.6816131178288','1','2009' UNION ALL
SELECT '1271003600','103.602002310821','2','2008' UNION ALL
SELECT '1271003600','81.9722910242125','2','2009' UNION ALL
SELECT '1271003600','72.3401647559977','3','2008' UNION ALL
SELECT '1271003600','86.1625895742832','3','2009' UNION ALL
SELECT '1271003600','103.361218989159','4','2008' UNION ALL
SELECT '1271003600','84.6065155612006','4','2009' UNION ALL
SELECT '1271003600','92.2834655375568','5','2008' UNION ALL
SELECT '1271003600','84.6688365382596','5','2009' UNION ALL
SELECT '1271003600','89.0687598341003','6','2008' UNION ALL
SELECT '1271003600','79.1241550533365','6','2009' UNION ALL
SELECT '1271003600','123.131877952709','7','2008' UNION ALL
SELECT '1271003600','89.3938505146093','7','2009' UNION ALL
SELECT '1271003600','87.2510378856992','8','2007' UNION ALL
SELECT '1271003600','94.0687846832911','8','2008' UNION ALL
SELECT '1271003600','340.381691069552','8','2009' UNION ALL
SELECT '1271003600','80.3632310400272','9','2007' UNION ALL
SELECT '1271003600','121.96840107118','9','2008' UNION ALL
SELECT '1271003600','87.1037670414245','10','2007' UNION ALL
SELECT '1271003600','94.9861134399298','10','2008' UNION ALL
SELECT '1271003600','73.8008470324637','11','2007' UNION ALL
SELECT '1271003600','78.537373182859','11','2008' UNION ALL
SELECT '1271003600','90.7988754111144','12','2007' UNION ALL
SELECT '1271003600','88.4954811818565','12','2008' UNION ALL
SELECT '1271005700 ','651.764','1','2008' UNION ALL
SELECT '1271005700 ','650.766','1','2009' UNION ALL
SELECT '1271005700 ','897.994','2','2008' UNION ALL
SELECT '1271005700 ','688.486','2','2009' UNION ALL
SELECT '1271005700 ','721.466','3','2008' UNION ALL
SELECT '1271005700 ','718.654','3','2009' UNION ALL
SELECT '1271005700 ','913.072','4','2008' UNION ALL
SELECT '1271005700 ','863.224','4','2009' UNION ALL
SELECT '1271005700 ','735.081','5','2008' UNION ALL
SELECT '1271005700 ','647.368','5','2009' UNION ALL
SELECT '1271005700 ','607.721','6','2008' UNION ALL
SELECT '1271005700 ','580.359','6','2009' UNION ALL
SELECT '1271005700 ','718.969','7','2008' UNION ALL
SELECT '1271005700 ','693.378','7','2009' UNION ALL
SELECT '1271005700 ','734.005','8','2007' UNION ALL
SELECT '1271005700 ','591.788','8','2008' UNION ALL
SELECT '1271005700 ','679.773','8','2009' UNION ALL
SELECT '1271005700 ','560.296','9','2007' UNION ALL
SELECT '1271005700 ','685.047','9','2008' UNION ALL
SELECT '1271005700 ','682.834','10','2007' UNION ALL
SELECT '1271005700 ','608.706','10','2008' UNION ALL
SELECT '1271005700 ','586.172','11','2007' UNION ALL
SELECT '1271005700 ','650.191','11','2008' UNION ALL
SELECT '1271005700 ','858.672','12','2007' UNION ALL
SELECT '1271005700 ','711.708','12','2008'
;
--=====================================================================================================================
-- Solution to the problem starts here
--=====================================================================================================================
--===== Conditionally drop the work table to make reruns for any troubleshooting easier.
IF OBJECT_ID('TempDB..#Work','U') IS NOT NULL
DROP TABLE #Work
;
--===== Declare local variables to support the upcoming running total update
DECLARE @PrevAcct VARCHAR(20),
@PrevBal DECIMAL(28,17),
@IntAmt DECIMAL(28,17)
;
--===== This set of cascading CTE's preconditions the data for processing and later display
WITH
cteAccount AS
( --=== Find all distinct account numbers so we can apply to all dates for rates without dupes
SELECT DISTINCT Acct_No
FROM #Refunds
),
cteRate AS
( --=== Apply the account number to all date rates without dupes
SELECT acct.Acct_No,
rate.[Year],
rate.[Month],
rate.InterestRate/12 AS MonthRate
FROM #InterestRates rate
CROSS JOIN cteAccount acct
)
SELECT ISNULL(rate.Acct_No,'') AS Acct_No,
ISNULL(DATEADD(mm,rate.[Month],DATEADD(yy,rate.[YEAR]-1900,0)),0) AS MonthDate,
rate.MonthRate,
ISNULL(amt.Amount,0) AS Amount,
CAST(NULL AS DECIMAL(28,17)) AS InterestAmount,
CAST(NULL AS DECIMAL(28,17)) AS Balance
INTO #Work
FROM cteRate rate
LEFT JOIN #Refunds amt
ON rate.Acct_No = amt.Acct_No
AND rate.[Year] = amt.[Year]
AND rate.[Month] = amt.[Month]
;
--===== Add the required clustered index
ALTER TABLE #Work
ADD PRIMARY KEY CLUSTERED (Acct_No, MonthDate) WITH FILLFACTOR = 100
;
--===== Do the update to calculate the interest amounts and the running total balance.
-- This will do a million rows in about 3 to 7 seconds
UPDATE #Work
SET @IntAmt = InterestAmount = CASE
WHEN @PrevAcct = Acct_No
THEN (@PrevBal + Amount) * Monthrate
ELSE (Amount) * MonthRate
END,
@PrevBal = Balance = CASE
WHEN @PrevAcct = Acct_No
THEN @PrevBal + Amount + @IntAmt
ELSE Amount + @IntAmt
END,
@PrevAcct = Acct_No
FROM #Work WITH(TABLOCKX) --Not required on a temp table, but good habit for this type of update.
OPTION (MAXDOP 1) --Absolutely essential whether you use a temp table or not.
;
... and here's a bit on how you might display things... we can round the amounts to 2 decimal places, if you'd like.
SELECT Acct_No, RIGHT(CONVERT(CHAR(11),MonthDate,113),8) AS [Month], Amount, InterestAmount, Balance
FROM #Work
ORDER BY Acct_No, MonthDate
;
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2010 at 2:33 am
For completeness, I'm going to add a new method to the mix (before anyone else jumps in!).
This is a method called 'set-based iteration' by MVP Hugo Kornelis, which was first brought to my attention here in a discussion of one of Jeff's fine articles.
While not quite as fast as the Quirky Update, it is in the same ballpark. It is faster than a recursive CTE on large sets, and even manages to out-perform the best CLR solution I have been able to put together, at least for some data sets.
The key to performance with the set-based iteration method is to have at least several groups in the data. It operates on a divide-and-conquer basis, and the unit of division is the group. In the present case, these groups are accounts - in general, they are the things that reset the running total. Worst case, if there is only one group, this method performs worse than an optimized cursor - so be aware of that.
In the sample data provided, there are two accounts, which is enough to make the method viable, but certainly not optimal. My guess is that the real data set contains many accounts, so that should be all good.
I have shamelessly re-used Jeff's fine test script, up to the point marked 'Solution to the problem starts here'. The following code runs from that point, and again draws heavily on Jeff's work. I hope he won't mind.
-- A work table to hold the reformatted data, and
-- ultimately, the results
CREATE TABLE #Work
(
Acct_No VARCHAR(20) NOT NULL,
MonthDate DATETIME NOT NULL,
MonthRate DECIMAL(19,12) NOT NULL,
Amount DECIMAL(19,12) NOT NULL,
InterestAmount DECIMAL(19,12) NOT NULL,
RunningTotal DECIMAL(19,12) NOT NULL,
RowRank BIGINT NOT NULL
);
-- Slightly modified from Jeff's example, to prepare
-- for the set-based iteration method
WITH Accounts
AS (
-- Get a list of the account numbers
SELECT DISTINCT Acct_No
FROM #Refunds
),
Rates
AS (
-- Apply all the accounts to all the rates
SELECT A.Acct_No,
R.[Year],
R.[Month],
MonthRate = R.InterestRate / 12
FROM #InterestRates R
CROSS
JOIN Accounts A
),
BaseData
AS (
-- The basic data we need to work with
SELECT Acct_No = ISNULL(R.Acct_No,''),
MonthDate = ISNULL(DATEADD(MONTH, R.[Month], DATEADD(YEAR, R.[year] - 1900, 0)), 0),
R.MonthRate,
Amount = ISNULL(RF.Amount,0),
InterestAmount = ISNULL(RF.Amount,0) * R.MonthRate,
RunningTotal = ISNULL(RF.Amount,0)
FROM Rates R
LEFT
JOIN #Refunds RF
ON RF.Acct_No = R.Acct_No
AND RF.[Year] = R.[Year]
AND RF.[Month] = R.[Month]
)
-- Basic data plus a rank id, numbering the rows by MonthDate, and resetting to 1 for each new Account
INSERT #Work
(Acct_No, MonthDate, MonthRate, Amount, InterestAmount, RunningTotal, RowRank)
SELECT BD.Acct_No, BD.MonthDate, BD.MonthRate, BD.Amount, BD.InterestAmount, BD.RunningTotal,
RowRank = RANK() OVER (PARTITION BY BD.Acct_No ORDER BY MonthDate)
FROM BaseData BD;
-- An index to speed the next stage (different from that used with the Quirky Update method)
CREATE UNIQUE CLUSTERED INDEX nc1 ON #Work (RowRank, Acct_No);
-- Iteration variables
DECLARE @Rank BIGINT,
@RowCount INTEGER;
-- Initialize
SELECT @Rank = 1,
@RowCount = 1;
-- This is the iteration bit, processes a rank id per iteration
-- The number of rows processed with each iteration is equal to the number of groups in the data
-- More groups --> greater efficiency
WHILE (1 = 1)
BEGIN
SET @Rank = @Rank + 1;
-- Set-based update with running totals for the current rank id
UPDATE This
SET InterestAmount = (Previous.RunningTotal + This.Amount) * This.MonthRate,
RunningTotal = Previous.RunningTotal + This.Amount + (Previous.RunningTotal + This.Amount) * This.MonthRate
FROM #Work This
JOIN #Work Previous
ON Previous.Acct_No = This.Acct_No
AND Previous.RowRank = @Rank - 1
WHERE This.RowRank = @Rank;
IF (@@ROWCOUNT = 0) BREAK;
END;
-- Show the results in natural order
SELECT *
FROM #Work
ORDER BY
Acct_No, RowRank;
I continue to recommend the Quirky Update over this method, for sheer speed, but this method does have the advantage of being fully documented and supported. I present it as 'Option 2'.
Paul
edit: to reflect the improvements in the recursive CTE method
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 27, 2010 at 4:36 am
Paul White (2/27/2010)
It is certainly orders of magnitude faster than a recursive CTE on large sets
Hi Paul, an excellent solution as always, and a method to remember.
There are a couple of things I've noticed while working with CTE's over the last few weeks of using them in anger. The first is that, even with relatively small data sets, chaining CTE's can be a really bad idea. The optimiser appears to screw things up, row counts go through the roof and runtimes rapidly escalate. I've got an excellent example of this where breaking the chain by spooling into a temp table improved performance about 20-fold on a modest pagination routine. Funny thing is, the very first run took less than a second, before the optimiser lost it.
The other thing I've noticed is that recursive CTE's can be blisteringly fast provided that the required conditions are met.
So, reusing Jeff's sample data (thanks Jeff), clumsily ramping up to about about 3,000 accounts across about 100,000 rows, and processing a data set which already contains a contiguous row number in the correct order for execution, here goes...
First, the sample data:
--=====================================================================================================================
-- This is the original test data including the second account you added this evening
--=====================================================================================================================
SET NOCOUNT ON
IF OBJECT_ID('TempDB..#InterestRates','U') IS NOT NULL
DROP TABLE #InterestRates
;
CREATE TABLE #InterestRates
(
[InterestRate] DECIMAL(28,17) NULL,
[Month] [int] NULL,
[year] [int] NULL
)
;
INSERT INTO #InterestRates
(InterestRate,[Month],[Year])
SELECT '0.105','8','2007' UNION ALL
SELECT '0.105','9','2007' UNION ALL
SELECT '0.105','10','2007' UNION ALL
SELECT '0.105','11','2007' UNION ALL
SELECT '0.105','12','2007' UNION ALL
SELECT '0.105','1','2008' UNION ALL
SELECT '0.105','2','2008' UNION ALL
SELECT '0.105','3','2008' UNION ALL
SELECT '0.105','4','2008' UNION ALL
SELECT '0.105','5','2008' UNION ALL
SELECT '0.105','6','2008' UNION ALL
SELECT '0.105','7','2008' UNION ALL
SELECT '0.105','8','2008' UNION ALL
SELECT '0.105','9','2008' UNION ALL
SELECT '0.105','10','2008' UNION ALL
SELECT '0.105','11','2008' UNION ALL
SELECT '0.105','12','2008' UNION ALL
SELECT '0.105','1','2009' UNION ALL
SELECT '0.105','2','2009' UNION ALL
SELECT '0.105','3','2009' UNION ALL
SELECT '0.105','4','2009' UNION ALL
SELECT '0.105','5','2009' UNION ALL
SELECT '0.105','6','2009' UNION ALL
SELECT '0.0067','7','2009' UNION ALL
SELECT '0.0067','8','2009' UNION ALL
SELECT '0.0067','9','2009' UNION ALL
SELECT '0.0055','10','2009' UNION ALL
SELECT '0.0055','11','2009' UNION ALL
SELECT '0.0055','12','2009' UNION ALL
SELECT '0.0055','1','2010' UNION ALL
SELECT '0.0055','2','2010'
;
IF OBJECT_ID('TempDB..#Refunds','U') IS NOT NULL
DROP TABLE #Refunds
;
CREATE TABLE #Refunds
(
[Acct_no] varchar(20) NULL,
[Amount] DECIMAL(28,17) NULL,
[Month] [int] NULL,
[year] [int] NULL
)
;
INSERT INTO #Refunds
(Acct_no,Amount, [Month],[year])
SELECT '1271003600','333.107456586203','1','2008' UNION ALL
SELECT '1271003600','87.6816131178288','1','2009' UNION ALL
SELECT '1271003600','103.602002310821','2','2008' UNION ALL
SELECT '1271003600','81.9722910242125','2','2009' UNION ALL
SELECT '1271003600','72.3401647559977','3','2008' UNION ALL
SELECT '1271003600','86.1625895742832','3','2009' UNION ALL
SELECT '1271003600','103.361218989159','4','2008' UNION ALL
SELECT '1271003600','84.6065155612006','4','2009' UNION ALL
SELECT '1271003600','92.2834655375568','5','2008' UNION ALL
SELECT '1271003600','84.6688365382596','5','2009' UNION ALL
SELECT '1271003600','89.0687598341003','6','2008' UNION ALL
SELECT '1271003600','79.1241550533365','6','2009' UNION ALL
SELECT '1271003600','123.131877952709','7','2008' UNION ALL
SELECT '1271003600','89.3938505146093','7','2009' UNION ALL
SELECT '1271003600','87.2510378856992','8','2007' UNION ALL
SELECT '1271003600','94.0687846832911','8','2008' UNION ALL
SELECT '1271003600','340.381691069552','8','2009' UNION ALL
SELECT '1271003600','80.3632310400272','9','2007' UNION ALL
SELECT '1271003600','121.96840107118','9','2008' UNION ALL
SELECT '1271003600','87.1037670414245','10','2007' UNION ALL
SELECT '1271003600','94.9861134399298','10','2008' UNION ALL
SELECT '1271003600','73.8008470324637','11','2007' UNION ALL
SELECT '1271003600','78.537373182859','11','2008' UNION ALL
SELECT '1271003600','90.7988754111144','12','2007' UNION ALL
SELECT '1271003600','88.4954811818565','12','2008' UNION ALL
SELECT '1271005700 ','651.764','1','2008' UNION ALL
SELECT '1271005700 ','650.766','1','2009' UNION ALL
SELECT '1271005700 ','897.994','2','2008' UNION ALL
SELECT '1271005700 ','688.486','2','2009' UNION ALL
SELECT '1271005700 ','721.466','3','2008' UNION ALL
SELECT '1271005700 ','718.654','3','2009' UNION ALL
SELECT '1271005700 ','913.072','4','2008' UNION ALL
SELECT '1271005700 ','863.224','4','2009' UNION ALL
SELECT '1271005700 ','735.081','5','2008' UNION ALL
SELECT '1271005700 ','647.368','5','2009' UNION ALL
SELECT '1271005700 ','607.721','6','2008' UNION ALL
SELECT '1271005700 ','580.359','6','2009' UNION ALL
SELECT '1271005700 ','718.969','7','2008' UNION ALL
SELECT '1271005700 ','693.378','7','2009' UNION ALL
SELECT '1271005700 ','734.005','8','2007' UNION ALL
SELECT '1271005700 ','591.788','8','2008' UNION ALL
SELECT '1271005700 ','679.773','8','2009' UNION ALL
SELECT '1271005700 ','560.296','9','2007' UNION ALL
SELECT '1271005700 ','685.047','9','2008' UNION ALL
SELECT '1271005700 ','682.834','10','2007' UNION ALL
SELECT '1271005700 ','608.706','10','2008' UNION ALL
SELECT '1271005700 ','586.172','11','2007' UNION ALL
SELECT '1271005700 ','650.191','11','2008' UNION ALL
SELECT '1271005700 ','858.672','12','2007' UNION ALL
SELECT '1271005700 ','711.708','12','2008'
;
--=====================================================================================================================
-- Solution to the problem starts here
--=====================================================================================================================
--===== Conditionally drop the work table to make reruns for any troubleshooting easier.
IF OBJECT_ID('TempDB..#Work','U') IS NOT NULL
DROP TABLE #Work;
--===== This set of cascading CTE's preconditions the data for processing and later display
WITH
cteAccount AS
( --=== Find all distinct account numbers so we can apply to all dates for rates without dupes
SELECT DISTINCT Acct_No
FROM #Refunds
),
cteRate AS
( --=== Apply the account number to all date rates without dupes
SELECT acct.Acct_No,
rate.[Year],
rate.[Month],
rate.InterestRate/12 AS MonthRate
FROM #InterestRates rate
CROSS JOIN cteAccount acct
)
SELECT ExecSeq = ISNULL(CAST(0 AS INT), 0),
ISNULL(rate.Acct_No,'') AS Acct_No,
ISNULL(DATEADD(mm,rate.[Month],DATEADD(yy,rate.[YEAR]-1900,0)),0) AS MonthDate,
rate.MonthRate,
ISNULL(amt.Amount,0) AS Amount,
CAST(NULL AS DECIMAL(28,17)) AS InterestAmount,
CAST(NULL AS DECIMAL(28,17)) AS Balance
INTO #Work
FROM cteRate rate
LEFT JOIN #Refunds amt
ON rate.Acct_No = amt.Acct_No
AND rate.[Year] = amt.[Year]
AND rate.[Month] = amt.[Month]
;
-- scale up the table yuck but it's a one-off
DECLARE @Acct_no INT, @Counter INT
SET @Counter = 1
WHILE @Counter < 3220 -- 322 corresponds to 10,000-ish rows
BEGIN
SELECT @Acct_no = MAX(Acct_no) FROM #Work
INSERT INTO #Work (ExecSeq, Acct_no, MonthDate, MonthRate, Amount)
SELECT ExecSeq, Acct_no = CAST(@Acct_no+1 AS VARCHAR(20)), MonthDate, MonthRate, Amount
FROM #Work WHERE Acct_no = CAST(@Acct_no AS VARCHAR(20)) -- 25 rows
SET @Counter = @Counter + 1
END
-- put in the all important execution order
UPDATE w SET ExecSeq = t.ExecSeq
FROM #Work w
INNER JOIN (SELECT ExecSeq = ROW_NUMBER() OVER(ORDER BY Acct_no, MonthDate), Acct_no, MonthDate FROM #Work) t
ON t.Acct_no = w.Acct_no AND t.MonthDate = w.MonthDate
--===== Add the very useful clustered index
CREATE CLUSTERED INDEX [ExecSeq] ON #Work ([ExecSeq] ASC)
Here's the slightly modified solution:
;WITH Calculator AS (
SELECT ExecSeq,
Acct_no,
Amount,
MonthDate,
MonthRate,
InterestAmount = ISNULL(InterestAmount,0)
FROM #Work w
WHERE ExecSeq = 1
UNION ALL
SELECT cr.ExecSeq,
ISNULL(cr.Acct_no, lr.Acct_no),
ISNULL(cr.Amount, 0),
cr.MonthDate,
cr.MonthRate,
InterestAmount = CAST(CASE WHEN cr.Acct_no = lr.Acct_no THEN (lr.Amount + lr.InterestAmount) * (1 + (cr.MonthRate)) ELSE 0 END AS DECIMAL(28,17))
FROM Calculator lr
INNER JOIN #Work cr ON cr.ExecSeq = lr.ExecSeq+1
)
SELECT * FROM Calculator ORDER BY ExecSeq OPTION (MAXRECURSION 0)
This runs in 5 seconds, or 3 if the ORDER BY is removed (???). If a single row is selected, say the second last (ExecSeq = 99850), it takes about 3 seconds and the result is correct.
100,000 rows, 5 seconds - that's not a misprint either Uncle Jeff 😛
A running total over a million rows of a simple but properly structured and indexed table can take as little as three or four seconds using this method.
Option 1, version 2.
Nathan, regarding recursive CTE's; "the reference to self in the recursive part represents the results of the previous iteration". It might be nowhere near what SQL Server is doing under the bonnet but it's all you need to know to make recursive CTE's work.
Cheers
ChrisM
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
February 27, 2010 at 5:17 am
Hi Chris,
I had to add a quick index on acct_no to get the table scale-up thing to work in my lifetime (the MAX gets out of hand without it!), but other than that:
This looks to be an interesting way to speed up this sort of recursive query, and it certainly seems to work. There are a number of similarities to the set-based iteration method, like the pre-computing of an order using a ranking function. I want to look at this a bit more deeply before saying much more, but the index spool at the 'lowest level' of the plan seems to be one of the keys to success here - it saves scanning the input table a good many times. I would expect an explicit index on the same keys as the spool to be beneficial, I'll come back to this.
You're right that recursive CTEs can be fast, but it is relative. I sometimes use them to seek down the distinct keys of a large index rather than scanning the whole thing and relying on an aggregate, for example. Where a good set-based alternative exists I usually go with that. Maybe I'll revisit that once I've had a deeper think on this.
I must admit I spent a few minutes expanding the data set to work with the set-based iteration method to see how that went. The recursive CTE ran for just over seven seconds on my old laptop before starting to return results. The set-based iteration method ran for two 😛
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 27, 2010 at 5:45 am
Chris,
Turns out all that is needed to get rid of the index spool is to define the clustered index as UNIQUE. Given that information, a better plan is produced (without the extra index spool), and the recursive CTE now starts to return results after four seconds on my laptop. Well done!
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 27, 2010 at 8:15 am
Paul White (2/27/2010)
Chris,Turns out all that is needed to get rid of the index spool is to define the clustered index as UNIQUE. Given that information, a better plan is produced (without the extra index spool), and the recursive CTE now starts to return results after four seconds on my laptop. Well done!
Paul
Heh nice work Paul!
I figured it would be sufficient to bring the time down to a few seconds - it only took half an hour with Jeff's modified sample generator - but this is the icing on the cake. I'll try this on the pagination query at work. Thanks!
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
February 27, 2010 at 9:08 am
Paul White (2/27/2010)
This is a method called 'set-based iteration' by MVP Hugo Kornelis, which was first brought to my attention here in a discussion of one of Jeff's fine articles.
Heh... that's why I listed cursors as the "third fastest". I even helped Hugo with a tweak on his fine method.
I have shamelessly re-used Jeff's fine test script, up to the point marked 'Solution to the problem starts here'. The following code runs from that point, and again draws heavily on Jeff's work. I hope he won't mind.
I never mind but this one was all provided by the OP. 🙂 He did a really fine job of providing test tables and data especially for a newbie to the forum.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2010 at 9:13 am
Paul White (2/27/2010)
Chris,Turns out all that is needed to get rid of the index spool is to define the clustered index as UNIQUE. Given that information, a better plan is produced (without the extra index spool), and the recursive CTE now starts to return results after four seconds on my laptop. Well done!
Paul
Was that for the 100k rows, Paul?
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2010 at 9:48 am
I'm overwhelmed with all the help. When I first posted this problem I figured it might be quite some time if ever that someone came up with a solution. Thank you all so much. I will be running these queries and puzzling out how they work on monday. I will let you all know how it goes. Once again thank you so much.
February 27, 2010 at 10:06 am
nathan 7372 (2/27/2010)
I'm overwhelmed with all the help. When I first posted this problem I figured it might be quite some time if ever that someone came up with a solution. Thank you all so much. I will be running these queries and puzzling out how they work on monday. I will let you all know how it goes. Once again thank you so much.
You only have yourself to thank. 🙂 Thanks for taking the time to read the "etiquette" article I led you to... I just knew a bunch of great people would jump in on this one if they had a little data to work with. Well done on the data you posted, Nathan.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 27, 2010 at 6:54 pm
Jeff Moden (2/27/2010)
Paul White (2/27/2010)
Chris,Turns out all that is needed to get rid of the index spool is to define the clustered index as UNIQUE. Given that information, a better plan is produced (without the extra index spool), and the recursive CTE now starts to return results after four seconds on my laptop. Well done!
Paul
Was that for the 100k rows, Paul?
Sure was, yes.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
February 27, 2010 at 6:55 pm
Jeff Moden (2/27/2010)
nathan 7372 (2/27/2010)
I'm overwhelmed with all the help. When I first posted this problem I figured it might be quite some time if ever that someone came up with a solution. Thank you all so much. I will be running these queries and puzzling out how they work on monday. I will let you all know how it goes. Once again thank you so much.You only have yourself to thank. 🙂 Thanks for taking the time to read the "etiquette" article I led you to... I just knew a bunch of great people would jump in on this one if they had a little data to work with. Well done on the data you posted, Nathan.
+1 🙂
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
March 1, 2010 at 1:43 pm
I just wanted to let everyone know how things went. I have to say I learned a lot about the different ways to solve this problem. All three queries ran exceptionally fast. I had to modify Chris' query a tad since it was off by one month on the interest rates used. Since the interest rates are so similiar it only had a minor effect on the outcome.
On an interesting note just by using the different datatypes between Jeff and Paul we get a difference of $374 out of a total around $4.5M. Just goes to show you the power of decimal places and rounding.
Thank you all for tackling this problem. I really feel I learned a lot and the accounting department is ecstatic to finally have the numbers so we can move forward. Thank you again.
March 1, 2010 at 4:09 pm
Thanks for the feedback. I am a bit confused, though... I thought Paul and I both used DECIMAL(28,17).
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 67 total)
You must be logged in to reply to this topic. Login to reply