September 23, 2015 at 7:46 am
I have table named #t1 and three columns. CODE, Column1, and Column2.
create table #t1 (
CODE NVARCHAR(20),
COLUMN1 NUMERIC(18,2),
COLUMN2 NUMERIC(18,2)
)
And i have some data:
INSERT INTO #t1 (CODE,COLUMN1,COLUMN2)
VALUES ('432', 0,100),
('TOTAL FOR 432',0,100),
('4320001',0,250),
('4320001',50,0),
('4320001',0,140),
('4320001',300,0),
('TOTAL FOR 4320001',350,390),
('432002',200,0),
('432002',0,100),
('TOTAL FOR 432002',200,100)
drop table #t1
I want to have 4 column (named BALANCE). Balance must be column that represent running totals between two columns (Column1 - Column2) for each group of data. For each group total must start from zero.
Output:
Take a look picture!
After total 432 it starts to count again for total 4320001 and again for total 432002..... How can i get this result?
I'm using MS SQL SERVER 2014
September 23, 2015 at 9:25 am
We don't use SQL2014 where I work but from a quick search the following link may be of use
https://msdn.microsoft.com/en-us/library/hh231256.aspx
That said, I think based on your test data the "Total For" row may cause an issue.
That said below is a quick attempt at it in 2008 using a self join and no doubt someone will come along with a much more efficient way to do it 😉
The row count and rank part was the only way I could think to not have Total For included and not affect the running total and ensure it always came at the end of the data for the group.
DECLARE @SomeData TABLE
(
CODE NVARCHAR(20) ,
COLUMN1 NUMERIC(18, 2) ,
COLUMN2 NUMERIC(18, 2)
)
INSERT INTO @SomeData
( CODE, COLUMN1, COLUMN2 )
VALUES ( N'432', 0, 100 ),
( 'TOTAL FOR 432', 0, 100 ),
( N'4320001', 0, 250 ),
( N'4320001', 50, 0 ),
( N'4320001', 0, 140 ),
( N'4320001', 300, 0 ),
( 'TOTAL FOR 4320001', 350, 390 ),
( '432002', 200, 0 ),
( '432002', 0, 100 ),
( 'TOTAL FOR 432002', 200, 100 );
WITH DataRanked
AS ( SELECT sd.CODE ,
sd.COLUMN1 ,
sd.COLUMN2 ,
sd.COLUMN1 - sd.column2 AS PreTotal ,
ROW_NUMBER() OVER ( PARTITION BY sd.CODE ORDER BY sd.CODE ) AS RCount ,
RANK() OVER ( PARTITION BY REPLACE(sd.CODE,
'TOTAL FOR ', '') ORDER BY sd.CODE ) AS FinalSorter
FROM @SomeData AS sd
)
SELECT b.CODE ,
b.COLUMN1 ,
b.COLUMN2 ,
SUM(ISNULL(b2.PreTotal, 0)) AS RTotal
FROM DataRanked b
LEFT OUTER JOIN DataRanked b2 ON b.CODE = b2.CODE
AND b2.RCount <= b.RCount
GROUP BY b.CODE ,
b.RCount ,
b.FinalSorter ,
b.COLUMN1 ,
b.COLUMN2
ORDER BY REPLACE(b.CODE, 'TOTAL FOR ', '') ,
b.FinalSorter ,
b.RCount
September 23, 2015 at 9:35 am
You can use an OVER() to help here, but you have no ordering, so how can you get a running total? How do you know that ( N'4320001', 0, 140 ) comes before ( N'4320001', 300, 0 )? The order of insert has zero impact on how the rows are stored.
September 23, 2015 at 9:39 am
I did notice no ordering on the data in the table but failed to mention it. My bad.
September 23, 2015 at 12:00 pm
What do you think about this:
DECLARE @t1 TABLE (
CODE nvarchar(20),
COLUMN1 numeric(18, 2),
COLUMN2 numeric(18, 2)
)
INSERT INTO @t1 (CODE, COLUMN1, COLUMN2)
VALUES ('432', 0, 100),
('TOTAL FOR 432', 0, 100),
('4320001', 0, 250),
('4320001', 50, 0),
('4320001', 0, 140),
('4320001', 300, 0),
('TOTAL FOR 4320001', 350, 390),
('432002', 200, 0),
('432002', 0, 100),
('TOTAL FOR 432002', 200, 100)
-- CTE
;
WITH CTE
AS (SELECT
*,
ROW_NUMBER() OVER (ORDER BY @@rowcount) RowNum -- Order as per the input
FROM @t1)
-- get from CTE
SELECT
*,
SUM(COLUMN1 - COLUMN2) OVER (PARTITION BY code ORDER BY RowNum) AS Balance --sum using PARTITION
FROM CTE
ORDER BY RowNum
September 23, 2015 at 1:20 pm
Here is a method that is usually faster.
But first, here is the data, adding a few columns and not inserting those total rows:
IF OBJECT_ID('tempdb.dbo.#t1') IS NOT NULL DROP TABLE #t1;
CREATE table #t1 (
RowID INTEGER IDENTITY PRIMARY KEY CLUSTERED,
CODE NVARCHAR(20),
COLUMN1 NUMERIC(18,2),
COLUMN2 NUMERIC(18,2),
RunningTotal NUMERIC(18,2)
);
INSERT INTO #t1 (CODE,COLUMN1,COLUMN2)
VALUES ('432', 0,100),
--('TOTAL FOR 432',0,100),
('4320001',0,250),
('4320001',50,0),
('4320001',0,140),
('4320001',300,0),
--('TOTAL FOR 4320001',350,390),
('432002',200,0),
('432002',0,100)--,
--('TOTAL FOR 432002',200,100);
This next part does a "Quirky Update". See remarks within the code.
-- this part using a "Quirky Update" - the fastest way to do running totals
-- but, this has a bunch of rules that need to be followed.
-- these rules can be found at http://www.sqlservercentral.com/articles/T-SQL/68467/
DECLARE @Sequence INTEGER, -- for the safety check
@RowID INTEGER, -- for the anchor column
@Code NVARCHAR(20),
@Balance NUMERIC(18,2);
SET @Sequence = 0;
SET @Code = N'';
WITH SafetyCheck AS
(
SELECT RowID,
CODE,
COLUMN1,
COLUMN2,
LineTotal = COLUMN1 - COLUMN2,
RunningTotal,
[Sequence] = ROW_NUMBER() OVER (ORDER BY RowID)
FROM #t1
)
UPDATE sc
SET @RowID = sc.RowID,
@Balance = CASE WHEN @Code = Code THEN @Balance + LineTotal ELSE LineTotal END, -- reset balance when code changes, otherwise accumulate the running total
RunningTotal = @Balance,
@Code = Code,
@Sequence = CASE WHEN @Sequence+1 = Sequence THEN @Sequence + 1 ELSE 1/0 END -- Safety Check aborts if not being calculated in the proper order
FROM SafetyCheck sc
SELECT *
FROM #t1;
Here is what I would use on SQL Server 2012 or higher. It's simpler, almost as fast, and doesn't have all of those undocumented (by MS) rules to make it work:
-- SQL SERVER 2012 version:
SELECT RowID,
CODE,
COLUMN1,
COLUMN2,
RunningTotal = SUM(COLUMN1 - COLUMN2) OVER (PARTITION BY CODE ORDER BY RowID)
FROM #t1
ORDER BY RowID;
Finally:
IF OBJECT_ID('tempdb.dbo.#t1') IS NOT NULL DROP TABLE #t1;
Both of these methods produce the following results:
RowID CODE COLUMN1 COLUMN2 RunningTotal
----- ------- ------- ------- ------------
1 432 0.00 100.00 -100.00
2 4320001 0.00 250.00 -250.00
3 4320001 50.00 0.00 -200.00
4 4320001 0.00 140.00 -340.00
5 4320001 300.00 0.00 -40.00
6 432002 200.00 0.00 200.00
7 432002 0.00 100.00 100.00
Does this look like what you want?
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
September 23, 2015 at 2:15 pm
View fourth answer and tell me your opinion
September 23, 2015 at 4:03 pm
kdejan87 (9/23/2015)
What do you think about this:
DECLARE @t1 TABLE (
CODE nvarchar(20),
COLUMN1 numeric(18, 2),
COLUMN2 numeric(18, 2)
)
INSERT INTO @t1 (CODE, COLUMN1, COLUMN2)
VALUES ('432', 0, 100),
('TOTAL FOR 432', 0, 100),
('4320001', 0, 250),
('4320001', 50, 0),
('4320001', 0, 140),
('4320001', 300, 0),
('TOTAL FOR 4320001', 350, 390),
('432002', 200, 0),
('432002', 0, 100),
('TOTAL FOR 432002', 200, 100)
-- CTE
;
WITH CTE
AS (SELECT
*,
ROW_NUMBER() OVER (ORDER BY @@rowcount) RowNum -- Order as per the input
FROM @t1)
-- get from CTE
SELECT
*,
SUM(COLUMN1 - COLUMN2) OVER (PARTITION BY code ORDER BY RowNum) AS Balance --sum using PARTITION
FROM CTE
ORDER BY RowNum
Don't do that! SUM OVER will crush your performance. Even if you remove the ORDER BY at the end of your query (which is not required) you'll see that the query plan produces an ugly sort. With the ORDER BY you get and additional SORT.
Below is the sample data I was working with. Like Steve Suggested and Wayne did, I added a ROWID so we have something to sort by.
-- Added ROWID so that we have something to sort by
-- made it a primary key so that the query plan does not produce a sort
IF OBJECT_ID('tempdb..#t1') IS NOT NULL DROP TABLE #t1
GO
CREATE TABLE #t1
(
ROWID int identity primary key,
CODE NVARCHAR(20),
COLUMN1 NUMERIC(18,2),
COLUMN2 NUMERIC(18,2)
)
INSERT INTO #t1 (CODE,COLUMN1,COLUMN2)
VALUES ('432', 0,100),
('TOTAL FOR 432',0,100),
('4320001',0,250),
('4320001',50,0),
('4320001',0,140),
('4320001',300,0),
('TOTAL FOR 4320001',350,390),
('432002',200,0),
('432002',0,100),
('TOTAL FOR 432002',200,100);
Below is your solution (a simplified version that produces the same query plan as what you posted). I included two correlated subqueries; one with an ORDER BY and one with out. If you run this you'll see a few things:
SET NOCOUNT ON;
SET STATISTICS IO ON;
PRINT 'QUERY 1: SUM + PARTITION';
SELECT
*,
SUM(COLUMN1 - COLUMN2) OVER (PARTITION BY code ORDER BY ROWID) AS Balance --sum using PARTITION
FROM #t1;
PRINT 'QUERY 2: SUM + PARTITION + ORDER BY ';
SELECT
*,
SUM(COLUMN1 - COLUMN2) OVER (PARTITION BY code ORDER BY ROWID) AS Balance --sum using PARTITION
FROM #t1
ORDER BY ROWID -- slows things down...
PRINT 'QUERY 3: CORRELATED SUBQUERY'
SELECT *, (SELECT SUM(COLUMN1 - COLUMN2) FROM #t1 t2 WHERE t1.ROWID=t2.ROWID AND T1.CODE=t2.CODE)
FROM #t1 t1
PRINT 'QUERY 4: CORRELATED SUBQUERY with sort'
SELECT *, (SELECT SUM(COLUMN1 - COLUMN2) FROM #t1 t2 WHERE t1.ROWID=t2.ROWID AND T1.CODE=t2.CODE)
FROM #t1 t1
ORDER BY ROWID;
SET STATISTICS IO OFF;
First, SUM OVER requires a sort (two with the ORDER BY statement) even though the ORDER BY is sorting on the clustered index. The two correlated subqueries make better use of the index. Plus they work on any version of SQL Server.
Correlated subqueries are NOT the way to go. Wayne showed a couple better solution. I just used them to show how inefficient SUM OVER is.
Lastly (this is the reason for my edit)... You'll also notice that I turned STATISTICS IO ON. If you examine the statistics for my four queries I posted you'll notice that the SUM OVER solutions both scan #t1 16 times (and that number will keep going up the more rows you add). The correlated subqueries scan the table twice, once for the outer query, once for the inner query. ROW OVER has nearly three times as many reads and that number will keep going up as you add rows.
In summary, Correlated subquery = bad; SUM OVER = very, very bad.
-- Itzik Ben-Gan 2001
September 23, 2015 at 4:23 pm
Was this your intended 2012 solution:
-- SQL SERVER 2012 version:
SET STATISTICS IO ON;
SELECT
RowID,
CODE,
COLUMN1,
COLUMN2,
RunningTotal =
SUM(COLUMN1 - COLUMN2) OVER
(
PARTITION BY CODE ORDER BY RowID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM #t1
ORDER BY RowID;
@kdejan87 - using 2012 framing (ROWS BETWEEN UNBOUNDED....) makes SUM OVER okay... Referring back to my queries with Statistics IO on, this solution blows the correlated subquery away
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t1_________________________________________________________________________________________________________________00000000000D'.
Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- Itzik Ben-Gan 2001
September 23, 2015 at 4:36 pm
kdejan87 (9/23/2015)
View fourth answer and tell me your opinion
[font="Arial Black"]ALL [/font]the answers are incorrect because there is no column that enforces the order of the data in the original data. Adding a ROWID won't help to do it right. If either in incrementing number or a transaction date/time is not available in the original data, all answers will continue to be incorrect. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
September 23, 2015 at 4:40 pm
kdejan87,
Does the table have any kind of column(s) that would indicate the correct order of the transactions?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 23, 2015 at 7:44 pm
Alan.B (9/23/2015)
@WayneWas this your intended 2012 solution:
-- SQL SERVER 2012 version:
SET STATISTICS IO ON;
SELECT
RowID,
CODE,
COLUMN1,
COLUMN2,
RunningTotal =
SUM(COLUMN1 - COLUMN2) OVER
(
PARTITION BY CODE ORDER BY RowID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
FROM #t1
ORDER BY RowID;
@kdejan87 - using 2012 framing (ROWS BETWEEN UNBOUNDED....) makes SUM OVER okay... Referring back to my queries with Statistics IO on, this solution blows the correlated subquery away
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#t1_________________________________________________________________________________________________________________00000000000D'.
Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Alan,
good catch, I did forget the ROWS clause.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
September 24, 2015 at 10:31 am
Jeff Moden (9/23/2015)
kdejan87 (9/23/2015)
View fourth answer and tell me your opinion[font="Arial Black"]ALL [/font]the answers are incorrect because there is no column that enforces the order of the data in the original data. Adding a ROWID won't help to do it right. If either in incrementing number or a transaction date/time is not available in the original data, all answers will continue to be incorrect. 😉
+1. This.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply