September 15, 2016 at 10:59 am
I want to add 4th column (% of total count) for example:
State Name------------Count1--------Total Charge Off Amount--------% of Total Count
Alabama------------------- 5 ---------------- 10000 ------------------ 50%
Arizona------------------- 2 ---------------- 40000 ---------------------20%
Arkansas -------------------3 --------------- 5000 -------------------- 30%
SELECT
A.[State Name]
,COUNT(A.[State Name]) AS [Count1]
,SUM (A.ChargeOffAmount) AS [Total Charge Off Amount]
FROM
(
SELECT DISTINCT
L.LoanID
, H.TransactionDate
, CASE WHEN H.TransactionAmt > 0 THEN H.TransactionAmt ELSE L.PrincipalBal END AS [ChargeOffAmount]
, ST.Name AS [State Name]
FROM Loan AS L
JOIN History AS H ON L.LoanID = H.LoanID
LEFT JOIN CreditScoreDetail AS CSD ON L.LoanID = CSD.LoanID AND L.AssmRecCounter = CSD.AssmRecCounter
LEFT JOIN Property AS PRO ON L.LoanID = PRO.LoanID
LEFT JOIN State AS ST ON PRO.State = ST.Code
WHERE
CONVERT(varchar(20), H.TransactionDate, 101) >= @StartDate
AND CONVERT(varchar(20), H.TransactionDate, 101) <= @EndDate
) AS A
LEFT OUTER JOIN Loan AS L ON L.loanID = A.LoanID
GROUP BY A.[State Name]
Can somebody suggest the code for 4th column
September 15, 2016 at 11:22 am
DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);
INSERT @table
VALUES
('Alabama',5,1000),
('Arizona',2,4000),
('Arkansas',3,5000);
SELECT
stateName,
Count1,
TotalChargeOffAmount,
[% of Total Count] = ((Count1*1.)/CountTotal)*100
FROM @table t1
CROSS APPLY (SELECT CountTotal = SUM(Count1) FROM @table) t2;
-- Itzik Ben-Gan 2001
September 15, 2016 at 1:05 pm
Also SUM with OVER()
DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);
INSERT @table
VALUES
('Alabama',5,1000),
('Arizona',2,4000),
('Arkansas',3,5000);
SELECT
stateName,
Count1,
TotalChargeOffAmount,
100.0*Count1 / SUM(t1.[Count1]) OVER () AS [Sum]
FROM @table t1
September 15, 2016 at 1:20 pm
Both Alan.B and Gazareth started with the summarized data. Here is how you would do it starting with the raw data.
DECLARE @table TABLE (stateName varchar(20), Amount int);
INSERT @table
VALUES
('Alabama',200),
('Alabama',200),
('Alabama',200),
('Alabama',200),
('Alabama',200),
('Arizona',2000),
('Arizona',2000),
('Arkansas',2000),
('Arkansas',2000),
('Arkansas',1000)
;
SELECT t.stateName, COUNT(*), SUM(t.Amount), 100.0 * COUNT(*) / SUM(COUNT(*)) OVER()
FROM @table t
GROUP BY t.stateName
Note that the denominator is an windowed aggregate of a regular aggregate.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 15, 2016 at 1:21 pm
It also appears that you may have a logic error in your code:
WHERE
CONVERT(varchar(20), H.TransactionDate, 101) >= @StartDate
AND CONVERT(varchar(20), H.TransactionDate, 101) <= @EndDate
If the H.TransactionDate field is a datetime, and contains a time portion, this appears to leave off everything for the end date.
As an example, the @EndDate is 9/30/2016, which is midnight.
If there are records that are created AFTER midnight, then this will not include them.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
September 15, 2016 at 1:29 pm
Michael L John (9/15/2016)
It also appears that you may have a logic error in your code:
WHERE
CONVERT(varchar(20), H.TransactionDate, 101) >= @StartDate
AND CONVERT(varchar(20), H.TransactionDate, 101) <= @EndDate
If the H.TransactionDate field is a datetime, and contains a time portion, this appears to leave off everything for the end date.
As an example, the @EndDate is 9/30/2016, which is midnight.
If there are records that are created AFTER midnight, then this will not include them.
The 101 format does not include the time portion. That being said, you would be better off casting these to DATE than VARCHAR. SQL 2012 can still use an index seek when casting DATETIME data to DATE, but it cannot when converting it to VARCHAR.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
September 15, 2016 at 1:31 pm
If its after midnight 09/30 then shouldn't the date be 10/01?
September 15, 2016 at 1:35 pm
That's right, it does not have any time portion, so everything 09/30 should be listed
September 15, 2016 at 1:41 pm
SQLPain (9/15/2016)
That's right, it does not have any time portion, so everything 09/30 should be listed
Good, because this is a frequent error that I see.
2016-09-30 15:39:50.197 would be an example of a time after midnight. if your data contained times, then this would not appear in your query.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
September 15, 2016 at 1:52 pm
Gazareth (9/15/2016)
Also SUM with OVER()
DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);
INSERT @table
VALUES
('Alabama',5,1000),
('Arizona',2,4000),
('Arkansas',3,5000);
SELECT
stateName,
Count1,
TotalChargeOffAmount,
100.0*Count1 / SUM(t1.[Count1]) OVER () AS [Sum]
FROM @table t1
The problem with SUM OVER() in this case (and often with Window Aggregate Functions in general) is that it requires work tables to be written the the tempdb and makes the query slower. Consider the following two queries:
DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);
INSERT @table
VALUES
('Alabama',5,1000),
('Arizona',2,4000),
('Arkansas',3,5000);
SET STATISTICS IO ON;
PRINT 'Using CROSS APPLY and SUM';
SELECT
stateName,
Count1,
TotalChargeOffAmount,
[% of Total Count] = (Count1*100.)/CountTotal
FROM @table t1
CROSS APPLY (SELECT CountTotal = SUM(Count1) FROM @table) t2;
PRINT CHAR(13)+CHAR(10)+'Using SUM OVER()';
SELECT
stateName,
Count1,
TotalChargeOffAmount,
100.0*Count1 / SUM(t1.Count1) OVER () AS [Sum]
FROM @table t1;
SET STATISTICS IO OFF;
and here's the IO results:
Using CROSS APPLY and SUM
Table '#ABCCA541'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Using SUM OVER()
Table 'Worktable'. Scan count 3, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ABCCA541'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Using SUM OVER() required 6X the number of reads. This is why I generally avoid Window Aggregate Functions. This is discussed in this article:
-- Itzik Ben-Gan 2001
September 16, 2016 at 7:18 am
Alan.B (9/15/2016)
Gazareth (9/15/2016)
Also SUM with OVER()
DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);
INSERT @table
VALUES
('Alabama',5,1000),
('Arizona',2,4000),
('Arkansas',3,5000);
SELECT
stateName,
Count1,
TotalChargeOffAmount,
100.0*Count1 / SUM(t1.[Count1]) OVER () AS [Sum]
FROM @table t1
The problem with SUM OVER() in this case (and often with Window Aggregate Functions in general) is that it requires work tables to be written the the tempdb and makes the query slower. Consider the following two queries:
DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);
INSERT @table
VALUES
('Alabama',5,1000),
('Arizona',2,4000),
('Arkansas',3,5000);
SET STATISTICS IO ON;
PRINT 'Using CROSS APPLY and SUM';
SELECT
stateName,
Count1,
TotalChargeOffAmount,
[% of Total Count] = (Count1*100.)/CountTotal
FROM @table t1
CROSS APPLY (SELECT CountTotal = SUM(Count1) FROM @table) t2;
PRINT CHAR(13)+CHAR(10)+'Using SUM OVER()';
SELECT
stateName,
Count1,
TotalChargeOffAmount,
100.0*Count1 / SUM(t1.Count1) OVER () AS [Sum]
FROM @table t1;
SET STATISTICS IO OFF;
and here's the IO results:
Using CROSS APPLY and SUM
Table '#ABCCA541'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Using SUM OVER()
Table 'Worktable'. Scan count 3, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ABCCA541'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Using SUM OVER() required 6X the number of reads. This is why I generally avoid Window Aggregate Functions. This is discussed in this article:
Although I absolutely agree that a large number of reads are usually an indication that something has gone haywire and could stand some real improvement, it's not always the case. "It Depends". In the past, I've seen Paul White solve a running total using a Triangular Join but the CPU and Duration were incredibly low rivaling even the Quirky Update. And, Tuesday night, I gave a demo of solving a problem using an initial single column GROUP BY to pre-aggregate 7 million row / 8GB table and then used 4 SUM() OVERs to very quickly solve for 4 other sums using to calculate percentages related to the first along with 4 DENSE RANKs. There were "only" ~31 thousand reads from the original table and ~482,000 reads from the work table. The SUM() OVERs also replaced the need for 4 other GROUP BY queries that would have required a join to the original GROUP BY to solve the problem. In the cCTE, that would have required the pre-aggregation to execute 4 times and doing it with a TempTable already took a bit more than 7 seconds not to mention the fact that 4 additional queries where starting to make the code a fair bit more complex.
The code with initial preagg GROUP BY followed by a cCTE that used SUM() OVER 4 times after that (all in same query, BTW), solved the problem in 4.5 seconds duration (INCLUDING display time of ~57,000 rows) and just over 9 seconds of CPU time and made the code incredibly simple to read and maintain.
The reason I came back on this was that I didn't want anyone to develop the habit of avoiding things like (SUM) OVER (or any query) based solely on the number of READs sometimes produced. You have to look at the "big 4" (CPU, Reads, Writes, and Duration, which is the most important to the persons using the code) as a complete package compared to how other methods would perform.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 16, 2016 at 9:02 am
Jeff Moden (9/16/2016)
Alan.B (9/15/2016)
Gazareth (9/15/2016)
Also SUM with OVER()
DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);
INSERT @table
VALUES
('Alabama',5,1000),
('Arizona',2,4000),
('Arkansas',3,5000);
SELECT
stateName,
Count1,
TotalChargeOffAmount,
100.0*Count1 / SUM(t1.[Count1]) OVER () AS [Sum]
FROM @table t1
The problem with SUM OVER() in this case (and often with Window Aggregate Functions in general) is that it requires work tables to be written the the tempdb and makes the query slower. Consider the following two queries:
DECLARE @table TABLE (stateName varchar(20), Count1 int, TotalChargeOffAmount int);
INSERT @table
VALUES
('Alabama',5,1000),
('Arizona',2,4000),
('Arkansas',3,5000);
SET STATISTICS IO ON;
PRINT 'Using CROSS APPLY and SUM';
SELECT
stateName,
Count1,
TotalChargeOffAmount,
[% of Total Count] = (Count1*100.)/CountTotal
FROM @table t1
CROSS APPLY (SELECT CountTotal = SUM(Count1) FROM @table) t2;
PRINT CHAR(13)+CHAR(10)+'Using SUM OVER()';
SELECT
stateName,
Count1,
TotalChargeOffAmount,
100.0*Count1 / SUM(t1.Count1) OVER () AS [Sum]
FROM @table t1;
SET STATISTICS IO OFF;
and here's the IO results:
Using CROSS APPLY and SUM
Table '#ABCCA541'. Scan count 2, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Using SUM OVER()
Table 'Worktable'. Scan count 3, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ABCCA541'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Using SUM OVER() required 6X the number of reads. This is why I generally avoid Window Aggregate Functions. This is discussed in this article:
Although I absolutely agree that a large number of reads are usually an indication that something has gone haywire and could stand some real improvement, it's not always the case. "It Depends". In the past, I've seen Paul White solve a running total using a Triangular Join but the CPU and Duration were incredibly low rivaling even the Quirky Update. And, Tuesday night, I gave a demo of solving a problem using an initial single column GROUP BY to pre-aggregate 7 million row / 8GB table and then used 4 SUM() OVERs to very quickly solve for 4 other sums using to calculate percentages related to the first along with 4 DENSE RANKs. There were "only" ~31 thousand reads from the original table and ~482,000 reads from the work table. The SUM() OVERs also replaced the need for 4 other GROUP BY queries that would have required a join to the original GROUP BY to solve the problem. In the cCTE, that would have required the pre-aggregation to execute 4 times and doing it with a TempTable already took a bit more than 7 seconds not to mention the fact that 4 additional queries where starting to make the code a fair bit more complex.
The code with initial preagg GROUP BY followed by a cCTE that used SUM() OVER 4 times after that (all in same query, BTW), solved the problem in 4.5 seconds duration (INCLUDING display time of ~57,000 rows) and just over 9 seconds of CPU time and made the code incredibly simple to read and maintain.
The reason I came back on this was that I didn't want anyone to develop the habit of avoiding things like (SUM) OVER (or any query) based solely on the number of READs sometimes produced. You have to look at the "big 4" (CPU, Reads, Writes, and Duration, which is the most important to the persons using the code) as a complete package compared to how other methods would perform.
Interesting stuff both! The actual execution plans are weighted in the other direction, with the cross apply 2x the cost of the sum() over().
Presumable due to 2 table scans with apply, 1 with sum() over().
September 16, 2016 at 10:26 pm
What you have to remember is, all of the costs are estimates. The only thing that ultimately matters is the actual performance.
Here's my favorite example on performance. Run the following code with the Actual execution plan turned on. The first query will give a "% of Batch" = 0 and the second query will = 100. Then read the messages tab and find out that nearly then exact opposite is true.
RAISERROR('Reursive CTE',0,0) WITH NOWAIT;
SET STATISTICS TIME,IO ON;
WITH cte AS
(
SELECT N = 1
UNION ALL
SELECT N = N + 1
FROM cte
WHERE N < 1000000
)
SELECT *
INTO #MyHead1
FROM cte
OPTION (MAXRECURSION 0)
;
SET STATISTICS TIME,IO OFF;
PRINT REPLICATE('-',119)
;
RAISERROR('Pseudo-Cursor',0,0) WITH NOWAIT;
SET STATISTICS TIME,IO ON;
SELECT TOP 1000000
N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
INTO #MyHead2
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
;
SET STATISTICS TIME,IO OFF;
PRINT REPLICATE('-',119);
GO
DROP TABLE #MyHead1,#MyHead2
GO
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply