Get running total in the following query

Question

Post reply

Get running total in the following query

a2zwd

SSCrazy

Points: 2356
More actions
July 18, 2012 at 7:32 am

#264013

Hi
I have a table which holds the running total.
DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT)
INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)
New resultset which needs to be inserted into the above table with running total.(Kind of incremental load)
DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT)
INSERT INTO @Cum VALUES(4,123,151,10), (5,123,161,20)
I am doing as below but not succeeded.
insert into @CumCredit
select *, (SELECT MAX(RunningTotal)+c.Pts AS Total FROM @CumCredit WHERE CustomerId = c.CustomerId) AS Total from @Cum c
select * from @CumCredit
Issue with the RowId=5, the running total at this rowid should be 75 instead of 65.
Where I am doing wrong?
thanks

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286590 More actions · Answer 1

See if this article helps. http://www.sqlservercentral.com/articles/T-SQL/68467/[/url]

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Alan Burstein SSC Guru Points: 61141 More actions · Answer 2

a2zwd (7/18/2012)
Hi
I have a table which holds the running total.
DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT)
INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)
New resultset which needs to be inserted into the above table with running total.(Kind of incremental load)
DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT)
INSERT INTO @Cum VALUES(4,123,151,10), (5,123,161,20)
I am doing as below but not succeeded.
insert into @CumCredit
select *, (SELECT MAX(RunningTotal)+c.Pts AS Total FROM @CumCredit WHERE CustomerId = c.CustomerId) AS Total from @Cum c
select * from @CumCredit
Issue with the RowId=5, the running total at this rowid should be 75 instead of 65.
Where I am doing wrong?
thanks

I think this is what you are looking for:

DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, PRIMARY KEY(RowID ASC))

INSERT INTO @CumCredit VALUES(1,123,121,10), (2,123,131,20), (3,123,141,15)

SELECT * FROM @CumCredit

DECLARE @min-2 int = (SELECT MIN([RowID]) FROM @CumCredit)

;WITH CTE ([RowId], [CustomerID], [TransID], [Pts], [Running Total])

AS

(

SELECT[RowId],

[CustomerID],

[TransID],

[Pts],

[Pts]

FROM @CumCredit

WHERE [RowId] = @min-2

UNION ALL

SELECTa.[RowId],

a.[CustomerID],

a.[TransId],

a.[Pts],

CTE.[Running Total] + a.[Pts]

FROM CTE

JOIN @CumCredit a ON CTE.[RowId] + 1 = a.[RowId]

)

SELECT * FROM CTE

OPTION (MAXRECURSION 0)

There are other ways to do this but using recursion is generally the fastest.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61141 More actions · Answer 3

Sean Lange (7/18/2012)
See if this article helps. http://www.sqlservercentral.com/articles/T-SQL/68467/[/url]

Jeff's article is certainly a must-read on this topic.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

I must be missing something but I don't see how that recursive CTE (as much as I like rCTEs) accounts for the RunningTotal existing on the last record that already exists in @CumCredit.

Here's an alternative, just as something different hoping that someone can find a reason to shoot it down.

DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT)

INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)

select * from @CumCredit

DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT)

INSERT INTO @Cum VALUES(4,123,151,10), (5,123,161,20), (6,123, 151, 20), (7, 123, 161, 40)

select * from @Cum

INSERT INTO @CumCredit

SELECT RowID, CustomerID, Transid, Pts

,RunningTotal=Pts + CASE WHEN Pts = RunningTotal

THEN (

SELECT RunningTotal=MAX(RunningTotal)

FROM @CumCredit b

WHERE a.CustomerID = b.CustomerID)

ELSE RunningTotal END

FROM (

SELECT RowID, CustomerID, Transid=MAX(Transid)

,Pts=MAX(b.Pts), RunningTotal=SUM(ISNULL(c.Pts, 0)) + MAX(rt)

,rn=ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY RowID)

FROM @Cum b

OUTER APPLY (

SELECT Pts

FROM @Cum a

WHERE a.RowID < b.RowID

) c

CROSS APPLY (

SELECT Pts=SUM(Pts)

FROM @CumCredit c

WHERE c.CustomerID = b.CustomerID) d(rt)

GROUP BY RowID, CustomerID

) a

Performance perhaps? Then again it is going up against an rCTE!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 5

XMLSQLNinja (7/18/2012)
...
There are other ways to do this but using recursion is generally the fastest.

Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join. Quite a few comparisons have been made between these methods here on ssc over the years.

If you use a rCTE for running totals, "next row" from the source table will be picked up by a table scan if there's no index on the sequence column, and the fastest index is usually a unique clustered index. So if your source table has a million rows, that's 999,999 table scans.

Triangular joins can perform quite well if the aggregated partitions are small relative to the whole data set (and supporting indexes exist); 100,000 customers each with 10 transactions (10 aggregating scans per customer, max ten rows) will perform far better than a running total over the whole million-row table (a million aggregating scans, max a million rows).

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Alan Burstein SSC Guru Points: 61141 More actions · Answer 6

Just got back from vacation and wanted to address this one first...

I must be missing something but I don't see how that recursive CTE (as much as I like rCTEs) accounts for the RunningTotal existing on the last record that already exists in @CumCredit.

I fixed and error and updated my query to handle the insert correctly:

DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT, PRIMARY KEY(RowID ASC))

DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT, PRIMARY KEY(RowID ASC))

INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)

INSERT INTO @Cum VALUES(4,123,151,10,NULL), (5,123,161,20,NULL), (6,123, 151, 20,NULL), (7, 123, 161, 40,NULL)

DECLARE @total INT = (SELECT RunningTotal FROM @CumCredit WHERE RowId = (SELECT MAX(RowID) FROM @CumCredit)),

@min-2 int = (SELECT MIN([RowID]) FROM @Cum)

;WITH CTE ([RowId], [CustomerID], [TransID], [Pts], [Running Total])

AS

(

SELECT[RowId],

[CustomerID],

[TransID],

[Pts],

@total + [Pts]

FROM @Cum

WHERE [RowId] = @min-2

UNION ALL

SELECTa.[RowId],

a.[CustomerID],

a.[TransId],

a.[Pts],

CTE.[Running Total] + a.[Pts]

FROM CTE

JOIN @Cum a ON CTE.[RowId] + 1 = a.[RowId]

)

INSERT INTO @CumCredit

SELECT * FROM CTE

OPTION (MAXRECURSION 0);

SELECT * FROM @CumCredit

GO

This is an elegent solution which will get you the desired results and the desired performance.

Here's an alternative, just as something different hoping that someone can find a reason to shoot it down.

Reviewing your code (above) let's look at the pros and cons:

Pro: It works and produces the desired results.

Con: Performance

I performed a test to compare the results of both queries:

First I put together code to produce sample data:

-- CREATE BIG SAMPLE DATA

SET NOCOUNT ON;

IF object_id('tempdb..##sampleData') IS NOT NULL

DROP TABLE ##sampleData;

CREATE TABLE ##sampleData

(

RowID int,

cID int,

TransID int,

pts int,

PRIMARY KEY([RowID])

);

-- Begin Routine

DECLARE @RowID INT = 4,

@val INT,

@rand1 INT,

@rand2 INT

WHILE @RowID <= 100

BEGIN

SET @val = ((RAND()*8)+1);

SET @val = @val*5

INSERT ##sampleData VALUES (@RowID,123,1999,@val)

SET @RowID = @RowID + 1

END

GO

Then, for each table inserted the values from ##sampleData into @Cum so we could test this 100 records, then 500, then 1K, 5K, 100K then 1,000,0000 records.

With 5 records both queries ran at the same speed ~6Kms.

... {fast forward}

With 1000 records mine (we'll call Q1) runs at 28Kms, yours (Q2) runs for 4 seconds.

With 2,500 records: Q1 = 60Kms, Q2 = 30 seconds

With 5K: Q1 = 00.10 seconds, Q2: 80.00 seconds

... {fast forward}

With 1 million records: My rCTE: 20 seconds, Q2 - did not bother

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61141 More actions · Answer 7

ChrisM@home (7/20/2012)
XMLSQLNinja (7/18/2012)
...
There are other ways to do this but using recursion is generally the fastest.
Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join...

Are these your findings or are you quoting something you read?

Let's start with the CLR... This is a T-SQL forum which is why a T-SQL solution was provided but I digress. When saying that a CLR is fastest are you taking into account how much time it takes to learn a new programming language? My query will insert a million rows in 20 seconds and I wrote it with just SQL. Do you have a faster CLR that you tested when you made that statement or are you just relying on what you read on the internet?

"Quirky Update" -- again, do you have an example that works here? Have you tested it?

Quite a few comparisons have been made between these methods here on ssc over the years.

Have any of those comparisons been made by you? Or, again, are you just relying on what you read on the Internet?

If you use a rCTE for running totals, "next row" from the source table will be picked up by a table scan if there's no index on the sequence column...

So, perhaps you should put an index on that table, eh? You will notice that I included a PK in my code. I did that for the reason you mentioned. The more thinking you do when writing you DDL, the less you thinking you will need to do when writing you DML.

Triangular joins can perform quite well if the aggregated partitions are small relative to the ...

It is statements like this that make me with there was an article on SSC titled, How to Answer Questions. Again, a lot of theory, a lot of what you read, not any examples to show that what you read is true. I have learned much of what I know from reading too but, until I have tested it myself, I really have not learned that much.

"In theory, theory and practice are the same thing. In practice they are not!" -- Yogi Berra

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Dwain Camps SSC Guru Points: 86978 More actions · Answer 8

XMLSQLNinja (7/23/2012)
Just got back from vacation and wanted to address this one first...
I must be missing something but I don't see how that recursive CTE (as much as I like rCTEs) accounts for the RunningTotal existing on the last record that already exists in @CumCredit.
I fixed and error and updated my query to handle the insert correctly:
DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT, PRIMARY KEY(RowID ASC))
DECLARE @Cum TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT, RunningTotal INT, PRIMARY KEY(RowID ASC))
INSERT INTO @CumCredit VALUES(1,123,121,10,10), (2,123,131,20,30), (3,123,141,15,45)
INSERT INTO @Cum VALUES(4,123,151,10,NULL), (5,123,161,20,NULL), (6,123, 151, 20,NULL), (7, 123, 161, 40,NULL)
DECLARE @total INT = (SELECT RunningTotal FROM @CumCredit WHERE RowId = (SELECT MAX(RowID) FROM @CumCredit)),
@min-2 int = (SELECT MIN([RowID]) FROM @Cum)
;WITH CTE ([RowId], [CustomerID], [TransID], [Pts], [Running Total])
AS
(
SELECT[RowId],
[CustomerID],
[TransID],
[Pts],
@total + [Pts]
FROM @Cum
WHERE [RowId] = @min-2
UNION ALL
SELECTa.[RowId],
a.[CustomerID],
a.[TransId],
a.[Pts],
CTE.[Running Total] + a.[Pts]
FROM CTE
JOIN @Cum a ON CTE.[RowId] + 1 = a.[RowId]
)
INSERT INTO @CumCredit
SELECT * FROM CTE
OPTION (MAXRECURSION 0);
SELECT * FROM @CumCredit
GO
This is an elegent solution which will get you the desired results and the desired performance.
Here's an alternative, just as something different hoping that someone can find a reason to shoot it down.
Reviewing your code (above) let's look at the pros and cons:
Pro: It works and produces the desired results.
Con: Performance
I performed a test to compare the results of both queries:
First I put together code to produce sample data:
-- CREATE BIG SAMPLE DATA
SET NOCOUNT ON;
IF object_id('tempdb..##sampleData') IS NOT NULL
DROP TABLE ##sampleData;
CREATE TABLE ##sampleData
(
RowID int,
cID int,
TransID int,
pts int,
PRIMARY KEY([RowID])
);
-- Begin Routine
DECLARE @RowID INT = 4,
@val INT,
@rand1 INT,
@rand2 INT
WHILE @RowID <= 100
BEGIN
SET @val = ((RAND()*8)+1);
SET @val = @val*5
INSERT ##sampleData VALUES (@RowID,123,1999,@val)
SET @RowID = @RowID + 1
END
GO
Then, for each table inserted the values from ##sampleData into @Cum so we could test this 100 records, then 500, then 1K, 5K, 100K then 1,000,0000 records.
With 5 records both queries ran at the same speed ~6Kms.
... {fast forward}
With 1000 records mine (we'll call Q1) runs at 28Kms, yours (Q2) runs for 4 seconds.
With 2,500 records: Q1 = 60Kms, Q2 = 30 seconds
With 5K: Q1 = 00.10 seconds, Q2: 80.00 seconds
... {fast forward}
With 1 million records: My rCTE: 20 seconds, Q2 - did not bother

Ouch! I knew mine wouldn't be real fast but didn't expect it to be that slow either!

Nice work Alan.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

Okay, now modify the rCTE to run against this dataset. It will contain 1,000,000 rows of data and potentially 10,000 unique cID's, not 1 (cID = 123) as in the other test data set.

CREATE TABLE #sampleData

(

RowID int,

cID int,

TransID int,

pts int,

PRIMARY KEY NONCLUSTERED([RowID])

);

declare @TransID int = 1;

with

e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

e2(n) as (select 1 from e1 a cross join e1 b),

e4(n) as (select 1 from e2 a cross join e2 b),

e6(n) as (select row_number() over (order by (select null)) from e4 a cross join e2 b)

insert into #sampleData (RowID, cID, TransID, pts)

select

n,

ceiling(rand(checksum(newid())) * 10000),

@TransID + (5 * n),

ceiling(rand(checksum(newid())) * 100)

from

e6;

create clustered index IX_CustTran on #sampleData (

cID,

TransID);

Jeff Moden SSC Guru Points: 1004410 More actions · Answer 10

Actually, Itzik Ben-Gan's new "Microsoft SQL Server 2012 High-Performance T-SQL Using Windowing Functions" shows that the Quirky Update and the new windowing-functions in 2012 beats the CLR and the rCTE comes in at a very poor 4th being more than 7 times slower than even the slowest of the rest. It's on page 170 and the ISBN is 978-0-7356-5836-3.

YMMV. It Depends. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004410 More actions · Answer 11

XMLSQLNinja (7/23/2012)

ChrisM@home (7/20/2012)
XMLSQLNinja (7/18/2012)
...
There are other ways to do this but using recursion is generally the fastest.
Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join...

Are these your findings or are you quoting something you read?

[/code]

I was going to suggest that you post your test code because I've not found rCTE's to be very fast at very much.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 12

Jeff Moden (7/23/2012)
I've not found rCTE's to be very fast at very much.

I know you're a busy guy Jeff, so you must've missed my rCTE solution to the Sweden redistricting problem. Made pretty short work of it, along with Les Cardwell's original, similar problem or so I understand.

They do have their niche. 😀

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Alan Burstein SSC Guru Points: 61141 More actions · Answer 13

Jeff Moden (7/23/2012)

XMLSQLNinja (7/23/2012)
ChrisM@home (7/20/2012)
XMLSQLNinja (7/18/2012)
...
There are other ways to do this but using recursion is generally the fastest.
Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join...
Are these your findings or are you quoting something you read?
[/code]
I was going to suggest that you post your test code because I've not found rCTE's to be very fast at very much.

I posted some test code in the first page of this thread. The comment at @ 9:55:46 PM.

I will post something a little more detailed tomorrow.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 14

XMLSQLNinja (7/23/2012)
ChrisM@home (7/20/2012)
XMLSQLNinja (7/18/2012)
...
There are other ways to do this but using recursion is generally the fastest.
Recursion is the third fastest; CLR > "Quirky Update" > rCTE > cursor > triangular join...
Are these your findings or are you quoting something you read? ...

Alan

Gosh, that's quite confrontational.

I do pick up information from the internet. Mostly from here, but also blogs: Paul Randall, Paul White, Gail Shaw, Hugo Kornelis to mention a very few.

Here's[/url] Jeff Moden's Quirky Update article again - if you remember, it was referenced earlier in this thread. The article is a direct comparison of the "quirky update" method vs cursor and triangular join solutions. It unfortunately doesn't include a rCTE comparison, so here it is.

I've taken the liberty of copying and adjusting the sample data scripts from the article to make it a little more forgiving towards the TJ method and I've also changed the cursor code to run significantly faster. To level out the playing field I've taken CLR solutions out of the equation altogether.

The results for QU, cursor and TJ are in the same order as Jeff's findings, and the rCTE comes in exactly as my previous - disputed - post: between QU and cursor.

QU: 1.2 seconds

rCTE about 20 seconds (same ballpark as your figures in an earlier post)

Cursor about 38 seconds

Triangular Join about 50 seconds.

Can you beat the QU with a rCTE?

Here's the table script:

/*************************************************************************************

Truncate the table if it exists, otherwise create it.

Generate a million rows with @Accounts number of accounts

and @Transactions number of transactions per account.

The following statement will create a million row sample data set

with 2000 different accounts, each with about 500 transactions.

EXEC [dbo].[QUT_RepopulateTransactionDetail] @Accounts = 2000

*************************************************************************************/

CREATE PROCEDURE [dbo].[QUT_RepopulateTransactionDetail]

(@Accounts INT)

AS

SET NOCOUNT ON

IF @Accounts IS NULL SET @Accounts = 2000

DECLARE @Transactions INT

SET @Transactions = 1000000/@Accounts

IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TransactionDetail]') AND type in (N'U'))

DROP TABLE [dbo].[TransactionDetail]

CREATE TABLE TransactionDetail(

TransactionDetailID int IDENTITY(1,1) NOT NULL,

[Date] datetime NULL,

AccountID int NOT NULL,

Amount money NULL,

AccountRunningTotal money NULL,

rn INT NULL) ON [PRIMARY]

INSERT INTO dbo.TransactionDetail

(AccountID, [Date], Amount)

SELECT

a.AccountID,

t.[Date],

t.Amount

FROM (

SELECT TOP(@Accounts)

AccountID = ABS(CHECKSUM(NEWID()))%@Accounts+1

FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2) a

CROSS JOIN (

SELECT TOP(@Transactions)

[Date] = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

Amount = CAST(CHECKSUM(NEWID())%@Transactions /100.0 AS MONEY)

FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2) t

ORDER BY a.AccountID, t.[Date]

UPDATE t SET rn = d.rn

FROM TransactionDetail t

INNER JOIN (

SELECT

TransactionDetailID,

rn = ROW_NUMBER() OVER(PARTITION BY AccountID ORDER BY [Date])

FROM TransactionDetail

) d ON d.TransactionDetailID = t.TransactionDetailID

-- A clustered index is ESSENTIAL for the QU: it defines the running order. Handy for the TJ too.

CREATE UNIQUE CLUSTERED INDEX [ucx_TransactionDetail_AccountID_rn] ON TransactionDetail (AccountID, rn)

-- always nice to have a PK - and essential for the cursor method.

ALTER TABLE [dbo].[TransactionDetail]

ADD CONSTRAINT [PK_TransactionDetail_TransactionDetailID] PRIMARY KEY NONCLUSTERED

([TransactionDetailID] ASC)

-- essential for the rCTE.

CREATE UNIQUE INDEX [uix_TransactionDetail_rn_AccountID] ON TransactionDetail (rn, AccountID) INCLUDE (Amount)

-- ALWAYS eyeball the data

SELECT TOP 2000 * FROM [TransactionDetail] WHERE rn = 1

RETURN 0

Here's the script to compare the different methods:

/*---------------------------------------------------------------

These results are from a million-row sample set consisting of

2000 accounts each with 500 transactions

---------------------------------------------------------------*/

SET NOCOUNT ON

-- METHOD 1 triangular join ============================================================

-- about 50 seconds ====================================================================

SELECT t.AccountID,

t.rn,

t.Amount, x.SUM_Amount

FROM dbo.TransactionDetail t

CROSS APPLY (

SELECT SUM_Amount = SUM(Amount)

FROM dbo.TransactionDetail s

WHERE s.AccountID = t.AccountID

AND s.rn <= t.rn

) x

-- METHOD 2 Cursor =====================================================================

-- about 40 seconds ====================================================================

--===== Declare the cursor storage variables

DECLARE @Amount MONEY

DECLARE @CurAccountID INT

DECLARE @TransactionDetailID INT

--===== Declare the working variables

DECLARE @PrevAccountID INT

DECLARE @AccountRunningTotal MONEY

--===== Create the cursor with rows sorted in the correct

-- order to do the running total by account

DECLARE curRunningTotal CURSOR LOCAL STATIC READ_ONLY

FOR

SELECT AccountID, TransactionDetailID, Amount

FROM dbo.TransactionDetail

ORDER BY AccountID, rn

OPEN curRunningTotal

FETCH NEXT FROM curRunningTotal

INTO @CurAccountID, @TransactionDetailID, @Amount

WHILE @@FETCH_STATUS = 0

BEGIN

--===== Calculate the running total for this row

-- and remember this AccountID for the next row

SELECT

@AccountRunningTotal = @Amount + CASE

WHEN @CurAccountID = @PrevAccountID THEN @AccountRunningTotal

ELSE 0 END,

@PrevAccountID = @CurAccountID

--===== Update the running total for this row using the PK

-- this is much faster than "WHERE CURRENT OF"

UPDATE dbo.TransactionDetail

SET AccountRunningTotal = @AccountRunningTotal

WHERE TransactionDetailID = @TransactionDetailID

FETCH NEXT FROM curRunningTotal

INTO @CurAccountID, @TransactionDetailID, @Amount

END

CLOSE curRunningTotal

DEALLOCATE curRunningTotal

-- verify results are correct

EXEC QUT_Verify

-- (see JBM's article for the code)

-- METHOD 3 rCTE ==============================================================

-- about 20 seconds ===========================================================

SET STATISTICS IO, TIME ON

;WITH Calculator AS (

SELECT

--[Level] = 1,

--TransactionDetailID, [Date],

AccountID, Amount, rn,

RunningTotal = Amount

FROM TransactionDetail

WHERE rn = 1

UNION ALL

SELECT

--[Level] = lr.[Level] + 1,

--tr.TransactionDetailID, tr.[Date],

x.AccountID, x.Amount, x.rn,

x.RunningTotal -- = lr.RunningTotal + x.Amount

FROM Calculator lr

CROSS APPLY (SELECT t.rn, t.AccountID, t.Amount, RunningTotal = lr.RunningTotal + t.Amount FROM TransactionDetail t WHERE lr.AccountID = t.AccountID) x

WHERE x.rn = lr.rn+1

)

SELECT *

FROM Calculator

WHERE AccountID IN (1,5,37,567,1999) -- ensures whole result set is calculated

ORDER BY AccountID, rn

OPTION(MAXRECURSION 0)

SET STATISTICS IO, TIME OFF

-- METHOD 4 Quirky Update =====================================================

-- about 1.2 seconds =============================================================

-- ensure running totals column is initialised to NULL

-- (see JBM's article for the code)

EXEC QUT_ResetTestTable

--===== Declare the working variables (uncomment to run this section alone)

--DECLARE @PrevAccountID INT

--DECLARE @AccountRunningTotal MONEY

SELECT @PrevAccountID = NULL, @AccountRunningTotal = NULL

--===== Run the update

SET STATISTICS IO, TIME ON

UPDATE dbo.TransactionDetail SET

@AccountRunningTotal = AccountRunningTotal = CASE

WHEN AccountID = @PrevAccountID THEN @AccountRunningTotal + Amount

ELSE Amount END,

@PrevAccountID = AccountID

FROM dbo.TransactionDetail WITH (TABLOCKX)

OPTION (MAXDOP 1)

SET STATISTICS IO, TIME OFF

-- 00:00:01

-- verify results are correct

EXEC QUT_Verify

-- (see JBM's article for the code)

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden