July 25, 2012 at 2:51 am
DECLARE @CumCredit TABLE
(RowId INT, CustomerId INT, TransId INT, Pts INT)
INSERT INTO @CumCredit (Rowid, CustomerID, TransId, pts)
select 1,123,121,10
union
select 2,123,131,20
union
select 3,123,141,15
select * from @CumCredit
select a.*, (select sum(Pts) from @CumCredit where rowid<=a.rowid) as RunningTotal
from @CumCredit a
July 25, 2012 at 4:02 am
beeramgopi (7/25/2012)
DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT)
INSERT INTO @CumCredit (Rowid, CustomerID, TransId, pts)
select 1,123,121,10
union
select 2,123,131,20
union
select 3,123,141,15
select * from @CumCredit
select a.*, (select sum(Pts) from @CumCredit where rowid<=a.rowid) as RunningTotal
from @CumCredit a
This Triangular Join works, but performance is usually lowest of the known TSQL methods.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 25, 2012 at 4:04 am
beeramgopi (7/25/2012)
DECLARE @CumCredit TABLE(RowId INT, CustomerId INT, TransId INT, Pts INT)
INSERT INTO @CumCredit (Rowid, CustomerID, TransId, pts)
select 1,123,121,10
union
select 2,123,131,20
union
select 3,123,141,15
select * from @CumCredit
select a.*, (select sum(Pts) from @CumCredit where rowid<=a.rowid) as RunningTotal
from @CumCredit a
Try that on a big table 😉
Read this article[/url] on why triangular joins are bad
ChrisM@Work (7/25/2012)
This Triangular Join works, but performance is usually lowest of the known TSQL methods.
Ah, beat me to it. I've been on holiday, so I'm slowly having to learn to type again. That's my excuse for slow typing and I'm sticking to it 😀
July 25, 2012 at 1:16 pm
Gosh, that's quite confrontational.
My sincere apologies if I came off as confrontational; that was not my intent.
I, too, get quite a bit of information on the Internet from this site as well as stuff written by a couple of the names you mentioned.
The point I was attempting to make (and concede that I did a poor job at doing so) is that: a lot of people say a lot of things about SQL. Until it is tested and/or proven it is, IMHO, it’s just something someone said on the Internet. Perhaps I should have asked this instead: “Is this something you tested? If so do you have any example code?” That was what I was looking for. I personally learn more when someone includes examples. In short, what I was looking for was your last post. Again, I am sorry if that came off as confrontational. I was not even disagreeing with you, more like playing devil’s advocate (e.g. the comment of CLR’s).
I have tested a few different methods for creating a running total. I am not married to any coding technique and am always looking for a faster way to get result sets. My findings have been that the rCTE is the fastest. That was true before this forum string and has been my findings since.
Below is the code I used to test the “quirky update” vs a rCTE. This is not an apples-to-apples comparison between what the original poster asked and what I am doing here. What I am about to show is why I believe an rCTE is the fastest means to do a running total. I originally began testing different techniques for loading a running total table into a data warehouse a week or so ago. I was creating a new table. The method we were going to use was a “INSERT INTO SalesRunningTotal SELECT * FROM RunningTotalQuery” technique. To determine which method was the fastest I several techniques for producing the fastest result set. Below is the query I used to produce the test data followed by 3 best performing queries. My findings were (fastest to slowest): rCTE, QU, Fast_Forward cursor, numerous other techniques not included cause the were slow.
SET NOCOUNT ON;
--First: Create Test Data
BEGIN
IF object_id('tempdb..##Sales') IS NOT NULL
DROP TABLE ##Sales;
CREATE TABLE ##Sales
(
[Day] int,
Sales money,
PRIMARY KEY([Day])
);
DECLARE @Day INT = 1, @Sales MONEY, @rand1 INT, @rand2 INT
-- Put your row count here...
WHILE @Day <= 100000
BEGIN
SELECT@rand1 = ((RAND()*800))+500,
@rand2 = ((RAND()*800)*-1)+200
SET @Sales = @rand1+@rand2;
INSERT ##Sales VALUES (@Day,@Sales);
SET @Day = @Day + 1
END;
CREATE NONCLUSTERED INDEX idx_X
ON ##Sales ([Day], Sales)
END;
GO
/**************************************************
Cursor Solution (4 sec/100K rows; 44 Sec/1M rows)
**************************************************/
BEGIN
DECLARE @Day int, @Sales money
DECLARE @RunningTotal money = 0
DECLARE @SalesTbl_2 TABLE
(
[Day]int,
Salesmoney,
RunningTotalmoney,
PRIMARY KEY([Day])
)
DECLARE rt_cursor CURSOR FAST_FORWARD FOR
SELECT [Day], Sales
FROM ##Sales
ORDER BY [Day]
OPEN rt_cursor
FETCH NEXT
FROM rt_cursor
INTO @Day,@Sales
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal = @RunningTotal + @Sales
INSERT @SalesTbl_2
VALUES (@Day,@Sales,@RunningTotal)
FETCH NEXT FROM rt_cursor INTO @Day,@Sales
END
CLOSE rt_cursor
DEALLOCATE rt_cursor
SELECT [Day],Sales,RunningTotal FROM @SalesTbl_2
END
GO
/**************************************************
QU Method (2 sec/100K rows; 24sec/1M rows)
**************************************************/
SET STATISTICS IO, TIME ON
BEGIN
IF OBJECT_ID('Tempdb..#sales_out') IS NOT NULL
DROP TABLE #sales_out;
CREATE TABLE #sales_out
(
[Day#]int,
[Sales#]MONEY,
[RunningTotal]MONEY,
PRIMARY KEY([Day#] ASC)
);
DECLARE @PrevDay INT, @RunningTotal MONEY = 0
INSERT INTO #sales_out ([Day#],[Sales#])
SELECT * FROM ##Sales;
UPDATE #sales_out
SET @RunningTotal = RunningTotal =CASE
WHEN [Day] = @PrevDay
THEN @RunningTotal+[Sales#]
ELSE Sales#
END,
@PrevDay = [Day]
FROM ##Sales WITH (TABLOCKX)
OPTION (MAXDOP 1)
SELECT [Day#],[Sales#],[RunningTotal] FROM #sales_out
END
SET STATISTICS IO, TIME OFF
GO
/**************************************************
rCTE Solution (1 sec/100K rows; 17 Sec/1M rows)
**************************************************/
SET STATISTICS IO, TIME ON
DECLARE @sales_out TABLE
(
[Day#]INT,
[Sales#]MONEY,
[RunningTotal]MONEY,
PRIMARY KEY([Day#] ASC)
);
DECLARE @PrevDay INT, @RunningTotal MONEY = 0;
WITH CTE ([Day], [Sales], [Running Total])
AS
(
SELECT[Day],
[Sales],
[Sales]
FROM ##Sales
WHERE [Day] = 1
UNION ALL
SELECTa.[Day],
a.[Sales],
CTE.[Running Total] + a.[Sales]
FROM CTE
JOIN ##Sales a ON CTE.[Day] + 1 = a.[Day]
)
SELECT * FROM CTE
OPTION (MAXRECURSION 0)
GO
Thoughts?
-- Itzik Ben-Gan 2001
July 25, 2012 at 1:23 pm
Lynn Pettis (7/23/2012)
Okay, now modify the rCTE to run against this dataset. It will contain 1,000,000 rows of data and potentially 10,000 unique cID's, not 1 (cID = 123) as in the other test data set.
CREATE TABLE #sampleData
(
RowID int,
cID int,
TransID int,
pts int,
PRIMARY KEY NONCLUSTERED([RowID])
);
declare @TransID int = 1;
with
e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
e2(n) as (select 1 from e1 a cross join e1 b),
e4(n) as (select 1 from e2 a cross join e2 b),
e6(n) as (select row_number() over (order by (select null)) from e4 a cross join e2 b)
insert into #sampleData (RowID, cID, TransID, pts)
select
n,
ceiling(rand(checksum(newid())) * 10000),
@TransID + (5 * n),
ceiling(rand(checksum(newid())) * 100)
from
e6;
create clustered index IX_CustTran on #sampleData (
cID,
TransID);
Very nice!
-- Itzik Ben-Gan 2001
July 25, 2012 at 1:38 pm
Thoughts? It all depends on where you run the code. On my development VM the Quircky Update beats your rCTE every time. Using your code the QU runs in about 1100 ms and your rCTE in about 1600 ms.
July 25, 2012 at 2:03 pm
Lynn Pettis (7/25/2012)
Thoughts? It all depends on where you run the code. On my development VM the Quircky Update beats your rCTE every time. Using your code the QU runs in about 1100 ms and your rCTE in about 1600 ms.
That is very interesting - I need to test this some more. The "Quirky Update" is new to me; I don't recall using it in the past. I read up on and tested for the first time this week. I was impressed and am sure I will be using it in the future.
-- Itzik Ben-Gan 2001
July 25, 2012 at 3:38 pm
XMLSQLNinja (7/25/2012)
Gosh, that's quite confrontational.
My sincere apologies if I came off as confrontational; that was not my intent. ...
No offence taken Alan, and thanks for encouraging these tests to be performed.
There's an error or two in the code you posted for the QU method, causing it to run for longer than necessary and generating incorrect results. An amended version of your comparison script including corrections to the QU method is posted below. I've changed the point at which the timer starts and ends to exclude table creation,where relevant, and also restricted the read of the result set to a single row to reduce the time it takes to write out the results to the client.
Having done this, and running against a million-row sample table, the QU comes in at around 2 seconds, the rCTE at about 19 seconds and the cursor at about 35 seconds. All three methods return the same result for day = 999991.
SET NOCOUNT ON;
--First: Create Test Data
BEGIN
IF object_id('tempdb..##Sales') IS NOT NULL
DROP TABLE ##Sales;
CREATE TABLE ##Sales
(
[Day] int,
Sales money,
PRIMARY KEY([Day])
);
DECLARE @Day INT = 1, @Sales MONEY, @rand1 INT, @rand2 INT
-- Put your row count here...
WHILE @Day <= 1000000
BEGIN
SELECT@rand1 = ((RAND()*800))+500,
@rand2 = ((RAND()*800)*-1)+200
SET @Sales = @rand1+@rand2;
INSERT ##Sales VALUES (@Day,@Sales);
SET @Day = @Day + 1
END;
CREATE NONCLUSTERED INDEX idx_X
ON ##Sales ([Day], Sales)
END;
GO
/**************************************************
Cursor Solution (4 sec/100K rows; 44 Sec/1M rows) /// 35S PER M ROWS
**************************************************/
SET STATISTICS IO, TIME OFF
BEGIN
DECLARE @Day int, @Sales money
DECLARE @RunningTotal money = 0
DECLARE @SalesTbl_2 TABLE
(
[Day]int,
Salesmoney,
RunningTotalmoney,
PRIMARY KEY([Day])
)
DECLARE rt_cursor CURSOR FAST_FORWARD FOR
SELECT [Day], Sales
FROM ##Sales
ORDER BY [Day]
OPEN rt_cursor
FETCH NEXT
FROM rt_cursor
INTO @Day,@Sales
WHILE @@FETCH_STATUS = 0
BEGIN
SET @RunningTotal = @RunningTotal + @Sales
INSERT @SalesTbl_2
VALUES (@Day,@Sales,@RunningTotal)
FETCH NEXT FROM rt_cursor INTO @Day,@Sales
END
CLOSE rt_cursor
DEALLOCATE rt_cursor
SELECT [Day],Sales,RunningTotal FROM @SalesTbl_2 WHERE [Day] = 999991
END
GO
/**************************************************
QU Method (2 sec/100K rows; 24sec/1M rows) /// About 2 seconds per million rows
**************************************************/
BEGIN
IF OBJECT_ID('Tempdb..#sales_out') IS NOT NULL
DROP TABLE #sales_out;
CREATE TABLE #sales_out
(
[Day]int,
[Sales]MONEY,
[RunningTotal]MONEY
);
INSERT INTO #sales_out ([Day],[Sales])
SELECT * FROM ##Sales ORDER BY [Day];
CREATE UNIQUE CLUSTERED INDEX ucx_Sales ON #sales_out ([Day]);
SET STATISTICS IO, TIME ON
DECLARE @PrevDay INT = 0, @RunningTotal MONEY = 0
UPDATE #sales_out
SET @RunningTotal = RunningTotal =@RunningTotal+[Sales],
@PrevDay = [Day]
FROM #sales_out WITH (TABLOCKX)
OPTION (MAXDOP 1)
SELECT [Day],[Sales],[RunningTotal] FROM #sales_out WHERE [Day] = 999991
SET STATISTICS IO, TIME OFF
END
GO
/**************************************************
rCTE Solution (1 sec/100K rows; 17 Sec/1M rows) /// 19S PER 1M ROWS
**************************************************/
SET STATISTICS IO, TIME ON
--DECLARE @sales_out TABLE
--(
--[Day#]INT,
--[Sales#]MONEY,
--[RunningTotal]MONEY,
--PRIMARY KEY([Day#] ASC)
--);
DECLARE @PrevDay INT, @RunningTotal MONEY = 0;
WITH CTE ([Day], [Sales], [Running Total])
AS
(
SELECT[Day],
[Sales],
[Sales]
FROM ##Sales
WHERE [Day] = 1
UNION ALL
SELECTa.[Day],
a.[Sales],
CTE.[Running Total] + a.[Sales]
FROM CTE
JOIN ##Sales a ON CTE.[Day] + 1 = a.[Day]
)
SELECT * FROM CTE WHERE [Day] = 999991
OPTION (MAXRECURSION 0)
SET STATISTICS IO, TIME Off
GO
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
July 25, 2012 at 4:00 pm
XMLSQLNinja (7/25/2012)
Lynn Pettis (7/25/2012)
Thoughts? It all depends on where you run the code. On my development VM the Quircky Update beats your rCTE every time. Using your code the QU runs in about 1100 ms and your rCTE in about 1600 ms.That is very interesting - I need to test this some more. The "Quirky Update" is new to me; I don't recall using it in the past. I read up on and tested for the first time this week. I was impressed and am sure I will be using it in the future.
It IS called the "QUIRKY" update for a reason. There are certain rules which need to be followed. Paul White and Tom Thompson came up with a built in safety check for it which is identified at the beginning of the article.
In light of the new "previous row" capabilites of 2012, I may not update the article as promised at the beginning of the article.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply