Get running total in the following query

Question

Post reply

Get running total in the following query

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

beeramgopi Newbie Points: 5 More actions · Answer 1

DECLARE @CumCredit TABLE

(RowId INT, CustomerId INT, TransId INT, Pts INT)

INSERT INTO @CumCredit (Rowid, CustomerID, TransId, pts)

select 1,123,121,10

union

select 2,123,131,20

union

select 3,123,141,15

select * from @CumCredit

select a.*, (select sum(Pts) from @CumCredit where rowid<=a.rowid) as RunningTotal

from @CumCredit a

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

beeramgopi (7/25/2012)
DECLARE @CumCredit TABLE
(RowId INT, CustomerId INT, TransId INT, Pts INT)
INSERT INTO @CumCredit (Rowid, CustomerID, TransId, pts)
select 1,123,121,10
union
select 2,123,131,20
union
select 3,123,141,15
select * from @CumCredit
select a.*, (select sum(Pts) from @CumCredit where rowid<=a.rowid) as RunningTotal
from @CumCredit a

This Triangular Join works, but performance is usually lowest of the known TSQL methods.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Cadavre SSC-Forever Points: 41690 More actions · Answer 3

beeramgopi (7/25/2012)
DECLARE @CumCredit TABLE
(RowId INT, CustomerId INT, TransId INT, Pts INT)
INSERT INTO @CumCredit (Rowid, CustomerID, TransId, pts)
select 1,123,121,10
union
select 2,123,131,20
union
select 3,123,141,15
select * from @CumCredit
select a.*, (select sum(Pts) from @CumCredit where rowid<=a.rowid) as RunningTotal
from @CumCredit a

Try that on a big table 😉

Read this article[/url] on why triangular joins are bad

--edit--

ChrisM@Work (7/25/2012)
This Triangular Join works, but performance is usually lowest of the known TSQL methods.

Ah, beat me to it. I've been on holiday, so I'm slowly having to learn to type again. That's my excuse for slow typing and I'm sticking to it 😀

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Alan Burstein SSC Guru Points: 61141 More actions · Answer 4

Gosh, that's quite confrontational.

My sincere apologies if I came off as confrontational; that was not my intent.

I, too, get quite a bit of information on the Internet from this site as well as stuff written by a couple of the names you mentioned.

The point I was attempting to make (and concede that I did a poor job at doing so) is that: a lot of people say a lot of things about SQL. Until it is tested and/or proven it is, IMHO, it’s just something someone said on the Internet. Perhaps I should have asked this instead: “Is this something you tested? If so do you have any example code?” That was what I was looking for. I personally learn more when someone includes examples. In short, what I was looking for was your last post. Again, I am sorry if that came off as confrontational. I was not even disagreeing with you, more like playing devil’s advocate (e.g. the comment of CLR’s).

I have tested a few different methods for creating a running total. I am not married to any coding technique and am always looking for a faster way to get result sets. My findings have been that the rCTE is the fastest. That was true before this forum string and has been my findings since.

Below is the code I used to test the “quirky update” vs a rCTE. This is not an apples-to-apples comparison between what the original poster asked and what I am doing here. What I am about to show is why I believe an rCTE is the fastest means to do a running total. I originally began testing different techniques for loading a running total table into a data warehouse a week or so ago. I was creating a new table. The method we were going to use was a “INSERT INTO SalesRunningTotal SELECT * FROM RunningTotalQuery” technique. To determine which method was the fastest I several techniques for producing the fastest result set. Below is the query I used to produce the test data followed by 3 best performing queries. My findings were (fastest to slowest): rCTE, QU, Fast_Forward cursor, numerous other techniques not included cause the were slow.

SET NOCOUNT ON;

--First: Create Test Data

BEGIN

IF object_id('tempdb..##Sales') IS NOT NULL

DROP TABLE ##Sales;

CREATE TABLE ##Sales

(

[Day] int,

Sales money,

PRIMARY KEY([Day])

);

DECLARE @Day INT = 1, @Sales MONEY, @rand1 INT, @rand2 INT

-- Put your row count here...

WHILE @Day <= 100000

BEGIN

SELECT@rand1 = ((RAND()*800))+500,

@rand2 = ((RAND()*800)*-1)+200

SET @Sales = @rand1+@rand2;

INSERT ##Sales VALUES (@Day,@Sales);

SET @Day = @Day + 1

END;

CREATE NONCLUSTERED INDEX idx_X

ON ##Sales ([Day], Sales)

END;

GO

/**************************************************

Cursor Solution (4 sec/100K rows; 44 Sec/1M rows)

**************************************************/

BEGIN

DECLARE @Day int, @Sales money

DECLARE @RunningTotal money = 0

DECLARE @SalesTbl_2 TABLE

(

[Day]int,

Salesmoney,

RunningTotalmoney,

PRIMARY KEY([Day])

)

DECLARE rt_cursor CURSOR FAST_FORWARD FOR

SELECT [Day], Sales

FROM ##Sales

ORDER BY [Day]

OPEN rt_cursor

FETCH NEXT

FROM rt_cursor

INTO @Day,@Sales

WHILE @@FETCH_STATUS = 0

BEGIN

SET @RunningTotal = @RunningTotal + @Sales

INSERT @SalesTbl_2

VALUES (@Day,@Sales,@RunningTotal)

FETCH NEXT FROM rt_cursor INTO @Day,@Sales

END

CLOSE rt_cursor

DEALLOCATE rt_cursor

SELECT [Day],Sales,RunningTotal FROM @SalesTbl_2

END

GO

/**************************************************

QU Method (2 sec/100K rows; 24sec/1M rows)

**************************************************/

SET STATISTICS IO, TIME ON

BEGIN

IF OBJECT_ID('Tempdb..#sales_out') IS NOT NULL

DROP TABLE #sales_out;

CREATE TABLE #sales_out

(

[Day#]int,

[Sales#]MONEY,

[RunningTotal]MONEY,

PRIMARY KEY([Day#] ASC)

);

DECLARE @PrevDay INT, @RunningTotal MONEY = 0

INSERT INTO #sales_out ([Day#],[Sales#])

SELECT * FROM ##Sales;

UPDATE #sales_out

SET @RunningTotal = RunningTotal =CASE

WHEN [Day] = @PrevDay

THEN @RunningTotal+[Sales#]

ELSE Sales#

END,

@PrevDay = [Day]

FROM ##Sales WITH (TABLOCKX)

OPTION (MAXDOP 1)

SELECT [Day#],[Sales#],[RunningTotal] FROM #sales_out

END

SET STATISTICS IO, TIME OFF

GO

/**************************************************

rCTE Solution (1 sec/100K rows; 17 Sec/1M rows)

**************************************************/

SET STATISTICS IO, TIME ON

DECLARE @sales_out TABLE

(

[Day#]INT,

[Sales#]MONEY,

[RunningTotal]MONEY,

PRIMARY KEY([Day#] ASC)

);

DECLARE @PrevDay INT, @RunningTotal MONEY = 0;

WITH CTE ([Day], [Sales], [Running Total])

AS

(

SELECT[Day],

[Sales],

[Sales]

FROM ##Sales

WHERE [Day] = 1

UNION ALL

SELECTa.[Day],

a.[Sales],

CTE.[Running Total] + a.[Sales]

FROM CTE

JOIN ##Sales a ON CTE.[Day] + 1 = a.[Day]

)

SELECT * FROM CTE

OPTION (MAXRECURSION 0)

GO

Thoughts?

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Alan Burstein SSC Guru Points: 61141 More actions · Answer 5

Lynn Pettis (7/23/2012)
Okay, now modify the rCTE to run against this dataset. It will contain 1,000,000 rows of data and potentially 10,000 unique cID's, not 1 (cID = 123) as in the other test data set.
CREATE TABLE #sampleData
(
RowID int,
cID int,
TransID int,
pts int,
PRIMARY KEY NONCLUSTERED([RowID])
);
declare @TransID int = 1;
with
e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
e2(n) as (select 1 from e1 a cross join e1 b),
e4(n) as (select 1 from e2 a cross join e2 b),
e6(n) as (select row_number() over (order by (select null)) from e4 a cross join e2 b)
insert into #sampleData (RowID, cID, TransID, pts)
select
n,
ceiling(rand(checksum(newid())) * 10000),
@TransID + (5 * n),
ceiling(rand(checksum(newid())) * 100)
from
e6;
create clustered index IX_CustTran on #sampleData (
cID,
TransID);

Very nice!

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Thoughts? It all depends on where you run the code. On my development VM the Quircky Update beats your rCTE every time. Using your code the QU runs in about 1100 ms and your rCTE in about 1600 ms.

Alan Burstein SSC Guru Points: 61141 More actions · Answer 7

Lynn Pettis (7/25/2012)
Thoughts? It all depends on where you run the code. On my development VM the Quircky Update beats your rCTE every time. Using your code the QU runs in about 1100 ms and your rCTE in about 1600 ms.

That is very interesting - I need to test this some more. The "Quirky Update" is new to me; I don't recall using it in the past. I read up on and tested for the first time this week. I was impressed and am sure I will be using it in the future.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 8

XMLSQLNinja (7/25/2012)
Gosh, that's quite confrontational.
My sincere apologies if I came off as confrontational; that was not my intent. ...

No offence taken Alan, and thanks for encouraging these tests to be performed.

There's an error or two in the code you posted for the QU method, causing it to run for longer than necessary and generating incorrect results. An amended version of your comparison script including corrections to the QU method is posted below. I've changed the point at which the timer starts and ends to exclude table creation,where relevant, and also restricted the read of the result set to a single row to reduce the time it takes to write out the results to the client.

Having done this, and running against a million-row sample table, the QU comes in at around 2 seconds, the rCTE at about 19 seconds and the cursor at about 35 seconds. All three methods return the same result for day = 999991.

SET NOCOUNT ON;

--First: Create Test Data

BEGIN

IF object_id('tempdb..##Sales') IS NOT NULL

DROP TABLE ##Sales;

CREATE TABLE ##Sales

(

[Day] int,

Sales money,

PRIMARY KEY([Day])

);

DECLARE @Day INT = 1, @Sales MONEY, @rand1 INT, @rand2 INT

-- Put your row count here...

WHILE @Day <= 1000000

BEGIN

SELECT@rand1 = ((RAND()*800))+500,

@rand2 = ((RAND()*800)*-1)+200

SET @Sales = @rand1+@rand2;

INSERT ##Sales VALUES (@Day,@Sales);

SET @Day = @Day + 1

END;

CREATE NONCLUSTERED INDEX idx_X

ON ##Sales ([Day], Sales)

END;

GO

/**************************************************

Cursor Solution (4 sec/100K rows; 44 Sec/1M rows) /// 35S PER M ROWS

**************************************************/

SET STATISTICS IO, TIME OFF

BEGIN

DECLARE @Day int, @Sales money

DECLARE @RunningTotal money = 0

DECLARE @SalesTbl_2 TABLE

(

[Day]int,

Salesmoney,

RunningTotalmoney,

PRIMARY KEY([Day])

)

DECLARE rt_cursor CURSOR FAST_FORWARD FOR

SELECT [Day], Sales

FROM ##Sales

ORDER BY [Day]

OPEN rt_cursor

FETCH NEXT

FROM rt_cursor

INTO @Day,@Sales

WHILE @@FETCH_STATUS = 0

BEGIN

SET @RunningTotal = @RunningTotal + @Sales

INSERT @SalesTbl_2

VALUES (@Day,@Sales,@RunningTotal)

FETCH NEXT FROM rt_cursor INTO @Day,@Sales

END

CLOSE rt_cursor

DEALLOCATE rt_cursor

SELECT [Day],Sales,RunningTotal FROM @SalesTbl_2 WHERE [Day] = 999991

END

GO

/**************************************************

QU Method (2 sec/100K rows; 24sec/1M rows) /// About 2 seconds per million rows

**************************************************/

BEGIN

IF OBJECT_ID('Tempdb..#sales_out') IS NOT NULL

DROP TABLE #sales_out;

CREATE TABLE #sales_out

(

[Day]int,

[Sales]MONEY,

[RunningTotal]MONEY

);

INSERT INTO #sales_out ([Day],[Sales])

SELECT * FROM ##Sales ORDER BY [Day];

CREATE UNIQUE CLUSTERED INDEX ucx_Sales ON #sales_out ([Day]);

SET STATISTICS IO, TIME ON

DECLARE @PrevDay INT = 0, @RunningTotal MONEY = 0

UPDATE #sales_out

SET @RunningTotal = RunningTotal =@RunningTotal+[Sales],

@PrevDay = [Day]

FROM #sales_out WITH (TABLOCKX)

OPTION (MAXDOP 1)

SELECT [Day],[Sales],[RunningTotal] FROM #sales_out WHERE [Day] = 999991

SET STATISTICS IO, TIME OFF

END

GO

/**************************************************

rCTE Solution (1 sec/100K rows; 17 Sec/1M rows) /// 19S PER 1M ROWS

**************************************************/

SET STATISTICS IO, TIME ON

--DECLARE @sales_out TABLE

--(

--[Day#]INT,

--[Sales#]MONEY,

--[RunningTotal]MONEY,

--PRIMARY KEY([Day#] ASC)

--);

DECLARE @PrevDay INT, @RunningTotal MONEY = 0;

WITH CTE ([Day], [Sales], [Running Total])

AS

(

SELECT[Day],

[Sales],

[Sales]

FROM ##Sales

WHERE [Day] = 1

UNION ALL

SELECTa.[Day],

a.[Sales],

CTE.[Running Total] + a.[Sales]

FROM CTE

JOIN ##Sales a ON CTE.[Day] + 1 = a.[Day]

)

SELECT * FROM CTE WHERE [Day] = 999991

OPTION (MAXRECURSION 0)

SET STATISTICS IO, TIME Off

GO

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Jeff Moden SSC Guru Points: 1004409 More actions · Answer 9

XMLSQLNinja (7/25/2012)
Lynn Pettis (7/25/2012)
Thoughts? It all depends on where you run the code. On my development VM the Quircky Update beats your rCTE every time. Using your code the QU runs in about 1100 ms and your rCTE in about 1600 ms.
That is very interesting - I need to test this some more. The "Quirky Update" is new to me; I don't recall using it in the past. I read up on and tested for the first time this week. I was impressed and am sure I will be using it in the future.

It IS called the "QUIRKY" update for a reason. There are certain rules which need to be followed. Paul White and Tom Thompson came up with a built in safety check for it which is identified at the beginning of the article.

In light of the new "previous row" capabilites of 2012, I may not update the article as promised at the beginning of the article.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)