Sum DISTINCT fields only

Question

Sum DISTINCT fields only

DVSQL

Hall of Fame

Points: 3108
More actions
June 22, 2012 at 3:43 pm

#392588

What is the best way to sum the distinct fields only. I've got a select that returns duplicate rows due to another table being inner joined. I clean that up by using a DISTINCT. The only downside is that when I try to SUM one of the columns, it returns a sum of all rows, not just DISTINCT ones. Sure enough, that's not what I need. The only way I found to get around this issue so far is to create a temp table, load distinct values into it, and then do a select from there. I'm just afraid that since this is for a report, it might be a bad idea to have it work with temporary table. Something tells me that the execution is going to be slow, not to mention it takes up space on the server in TempDB.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Roland Alexander STL SSCrazy Points: 2915 More actions · Answer 1

There's no reason you can't combine DISTINCT and SUM. I can't recommend specifics, since you didn't post your SQL, but there is no reason the two shouldn't work together. Post your query or SP and we'll make specific recommendations on how to do it, but generically it is like this:

SELECT DISTINCT

OD.ProductID,

SUM(OD.Order_Qty)

FROM

Order O

INNER JOIN

Order_Detail OD

ON

O.Order_ID = OD.Order_ID

GROUP BY

OD.ProductID;

Roland Alexander
The Monday Morning DBA
There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

Ron McCullough SSC Guru Points: 63877 More actions · Answer 2

Is this what you are attempting?

CREATE TABLE #T(Col1 INT)

INSERT INTO #T

SELECT 1 UNION ALL

SELECT 2 UNION ALL

SELECT 3 UNION ALL

SELECT 4 UNION ALL

SELECT 1

SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T

Results:

Sum of distinct values Sum of all values

10 12

If everything seems to be going well, you have obviously overlooked something.

Ron

Please help us, help you -before posting a question please read[/url]
Before posting a performance problem please read[/url]

drew.allen SSC Guru Points: 76988 More actions · Answer 3

I don't think that using DISTINCT is the correct way to solve this problem. DISTINCT will ignore ALL duplicate values, not just the ones caused by your join. The two main approaches for solving this problem are to

1) Sum your values before joining to the second table.

2) Filter the secondary table so that the join only produces one result.

Since you haven't given details about your problem, it's impossible to say which approach is better in your situation.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

paul.knibbs SSCoach Points: 15320 More actions · Answer 4

Would using GROUP BY rather than DISTINCT work in this situation?

drew.allen SSC Guru Points: 76988 More actions · Answer 5

No, the DISTINCT masks the problem with the join creating false "duplicates" in the sum. The GROUP BY only masks this same problem in a different way. The correct way to fix this problem is to fix/replace the join so that it doesn't produce the duplicates in the first place.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

DVSQL Hall of Fame Points: 3108 More actions · Answer 6

Unfortunately, there is no way to fix the particular join to lose the duplicate values. I've checked every which way, and the tables I'm joining are only linked to each other one way. I was able to get around the issue by doing a select into a temp table, making a select from there while using SUM, and then dropping the temp table. Seems to be working very well, and more importantly, fast.

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 7

DROP TABLE #T;

CREATE TABLE #T(Col1 INT)

INSERT INTO #T

SELECT 1 UNION ALL

SELECT 2 UNION ALL

SELECT 3 UNION ALL

SELECT 4 UNION ALL

SELECT 1

--SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T

SELECT SUM(col1)

FROM #T

WHERE col1 IN ( SELECT Col1

FROM #T

GROUP BY Col1

HAVING COUNT(*)=1)

???

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

DVSQL Hall of Fame Points: 3108 More actions · Answer 8

Les Cardwell (6/26/2012)
DROP TABLE #T; CREATE TABLE #T(Col1 INT) INSERT INTO #T SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 1 --SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T SELECT SUM(col1) FROM #T WHERE col1 IN ( SELECT Col1 FROM #T GROUP BY Col1 HAVING COUNT(*)=1)
???

That's pretty exactly what I did.

Dwain Camps SSC Guru Points: 86948 More actions · Answer 9

Les Cardwell (6/26/2012)
DROP TABLE #T; CREATE TABLE #T(Col1 INT) INSERT INTO #T SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 1 --SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T SELECT SUM(col1) FROM #T WHERE col1 IN ( SELECT Col1 FROM #T GROUP BY Col1 HAVING COUNT(*)=1)
???

Les - When I run your query I get 9 but shouldn't it be 10 (1+2+3+4)?

This is another way.

;WITH CTE AS (

SELECT Col1,rn=ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY (SELECT NULL))

FROM #T)

SELECT SUM(Col1)

FROM CTE

WHERE rn=1

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

rhythmk SSCertifiable Points: 7271 More actions · Answer 10

DVSQL (6/22/2012)
What is the best way to sum the distinct fields only. I've got a select that returns duplicate rows due to another table being inner joined. I clean that up by using a DISTINCT. The only downside is that when I try to SUM one of the columns, it returns a sum of all rows, not just DISTINCT ones. Sure enough, that's not what I need. The only way I found to get around this issue so far is to create a temp table, load distinct values into it, and then do a select from there. I'm just afraid that since this is for a report, it might be a bad idea to have it work with temporary table. Something tells me that the execution is going to be slow, not to mention it takes up space on the server in TempDB.

Try Common table expression instead of temp table with Row_number() function to get the distinct values and then apply sum on required column of the CTE.

Here is sample code for you.

;WITH CTE AS

(SELECT EMPID,EMPNAME,DEPID,SAL,ROW_NUMBER() OVER (PARTITION BY EMPID,EMPNAME,DEPID,SAL ORDER BY (SELECT NULL)) AS RN

FROM SP_EMP)

SELECT SUM(SAL)

FROM CTE

WHERE RN = 1

--rhythmk
------------------------------------------------------------------
To post your question use below link

https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help
🙂

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 11

Dwain,

Since #1 is not unique, unless I misunderstood, it shouldn't be included. Only 2, 3, 4 are unique in the set.

~Les

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

Dwain Camps SSC Guru Points: 86948 More actions · Answer 12

Les Cardwell (6/27/2012)
Dwain,
Since #1 is not unique, unless I misunderstood, it shouldn't be included. Only 2, 3, 4 are unique in the set.
~Les

Hmmm. Must've missed that. So then this would work:

SELECT Total=SUM(Col1)

FROM (

SELECT Col1

FROM #T

GROUP BY Col1

HAVING COUNT(Col1) = 1) x

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

jshahan SSCarpal Tunnel Points: 4622 More actions · Answer 13

I wrote a SQL CLR function to do this but if you can stay in TSQL, you're probably better off.