June 22, 2012 at 3:43 pm
What is the best way to sum the distinct fields only. I've got a select that returns duplicate rows due to another table being inner joined. I clean that up by using a DISTINCT. The only downside is that when I try to SUM one of the columns, it returns a sum of all rows, not just DISTINCT ones. Sure enough, that's not what I need. The only way I found to get around this issue so far is to create a temp table, load distinct values into it, and then do a select from there. I'm just afraid that since this is for a report, it might be a bad idea to have it work with temporary table. Something tells me that the execution is going to be slow, not to mention it takes up space on the server in TempDB.
June 23, 2012 at 5:40 am
There's no reason you can't combine DISTINCT and SUM. I can't recommend specifics, since you didn't post your SQL, but there is no reason the two shouldn't work together. Post your query or SP and we'll make specific recommendations on how to do it, but generically it is like this:
SELECT DISTINCT
OD.ProductID,
SUM(OD.Order_Qty)
FROM
Order O
INNER JOIN
Order_Detail OD
ON
O.Order_ID = OD.Order_ID
GROUP BY
OD.ProductID;
Roland Alexander
The Monday Morning DBA
There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer
June 23, 2012 at 6:40 am
Is this what you are attempting?
CREATE TABLE #T(Col1 INT)
INSERT INTO #T
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 1
SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T
Results:
Sum of distinct values Sum of all values
10 12
June 25, 2012 at 7:22 am
I don't think that using DISTINCT is the correct way to solve this problem. DISTINCT will ignore ALL duplicate values, not just the ones caused by your join. The two main approaches for solving this problem are to
1) Sum your values before joining to the second table.
2) Filter the secondary table so that the join only produces one result.
Since you haven't given details about your problem, it's impossible to say which approach is better in your situation.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 26, 2012 at 3:42 am
Would using GROUP BY rather than DISTINCT work in this situation?
June 26, 2012 at 7:31 am
No, the DISTINCT masks the problem with the join creating false "duplicates" in the sum. The GROUP BY only masks this same problem in a different way. The correct way to fix this problem is to fix/replace the join so that it doesn't produce the duplicates in the first place.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 26, 2012 at 3:32 pm
Unfortunately, there is no way to fix the particular join to lose the duplicate values. I've checked every which way, and the tables I'm joining are only linked to each other one way. I was able to get around the issue by doing a select into a temp table, making a select from there while using SUM, and then dropping the temp table. Seems to be working very well, and more importantly, fast.
June 26, 2012 at 3:50 pm
DROP TABLE #T;
CREATE TABLE #T(Col1 INT)
INSERT INTO #T
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 1
--SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T
SELECT SUM(col1)
FROM #T
WHERE col1 IN ( SELECT Col1
FROM #T
GROUP BY Col1
HAVING COUNT(*)=1)
???
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 26, 2012 at 4:21 pm
Les Cardwell (6/26/2012)
DROP TABLE #T;
CREATE TABLE #T(Col1 INT)
INSERT INTO #T
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 1
--SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T
SELECT SUM(col1)
FROM #T
WHERE col1 IN ( SELECT Col1
FROM #T
GROUP BY Col1
HAVING COUNT(*)=1)
???
That's pretty exactly what I did.
June 26, 2012 at 6:32 pm
Les Cardwell (6/26/2012)
DROP TABLE #T;
CREATE TABLE #T(Col1 INT)
INSERT INTO #T
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 1
--SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T
SELECT SUM(col1)
FROM #T
WHERE col1 IN ( SELECT Col1
FROM #T
GROUP BY Col1
HAVING COUNT(*)=1)
???
Les - When I run your query I get 9 but shouldn't it be 10 (1+2+3+4)?
This is another way.
;WITH CTE AS (
SELECT Col1,rn=ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY (SELECT NULL))
FROM #T)
SELECT SUM(Col1)
FROM CTE
WHERE rn=1
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 27, 2012 at 7:03 am
DVSQL (6/22/2012)
What is the best way to sum the distinct fields only. I've got a select that returns duplicate rows due to another table being inner joined. I clean that up by using a DISTINCT. The only downside is that when I try to SUM one of the columns, it returns a sum of all rows, not just DISTINCT ones. Sure enough, that's not what I need. The only way I found to get around this issue so far is to create a temp table, load distinct values into it, and then do a select from there. I'm just afraid that since this is for a report, it might be a bad idea to have it work with temporary table. Something tells me that the execution is going to be slow, not to mention it takes up space on the server in TempDB.
Try Common table expression instead of temp table with Row_number() function to get the distinct values and then apply sum on required column of the CTE.
Here is sample code for you.
;WITH CTE AS
(SELECT EMPID,EMPNAME,DEPID,SAL,ROW_NUMBER() OVER (PARTITION BY EMPID,EMPNAME,DEPID,SAL ORDER BY (SELECT NULL)) AS RN
FROM SP_EMP)
SELECT SUM(SAL)
FROM CTE
WHERE RN = 1
--rhythmk
------------------------------------------------------------------
To post your question use below link
https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help
🙂
June 27, 2012 at 9:10 am
Dwain,
Since #1 is not unique, unless I misunderstood, it shouldn't be included. Only 2, 3, 4 are unique in the set.
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 27, 2012 at 6:29 pm
Les Cardwell (6/27/2012)
Dwain,Since #1 is not unique, unless I misunderstood, it shouldn't be included. Only 2, 3, 4 are unique in the set.
~Les
Hmmm. Must've missed that. So then this would work:
SELECT Total=SUM(Col1)
FROM (
SELECT Col1
FROM #T
GROUP BY Col1
HAVING COUNT(Col1) = 1) x
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 28, 2012 at 4:39 pm
I wrote a SQL CLR function to do this but if you can stay in TSQL, you're probably better off.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply