Sum DISTINCT fields only

  • What is the best way to sum the distinct fields only. I've got a select that returns duplicate rows due to another table being inner joined. I clean that up by using a DISTINCT. The only downside is that when I try to SUM one of the columns, it returns a sum of all rows, not just DISTINCT ones. Sure enough, that's not what I need. The only way I found to get around this issue so far is to create a temp table, load distinct values into it, and then do a select from there. I'm just afraid that since this is for a report, it might be a bad idea to have it work with temporary table. Something tells me that the execution is going to be slow, not to mention it takes up space on the server in TempDB.

  • There's no reason you can't combine DISTINCT and SUM. I can't recommend specifics, since you didn't post your SQL, but there is no reason the two shouldn't work together. Post your query or SP and we'll make specific recommendations on how to do it, but generically it is like this:

    SELECT DISTINCT

    OD.ProductID,

    SUM(OD.Order_Qty)

    FROM

    Order O

    INNER JOIN

    Order_Detail OD

    ON

    O.Order_ID = OD.Order_ID

    GROUP BY

    OD.ProductID;

    Roland Alexander 
    The Monday Morning DBA 
    There are two means of refuge from the miseries of life: music and cats. ~ Albert Schweitzer

  • Is this what you are attempting?

    CREATE TABLE #T(Col1 INT)

    INSERT INTO #T

    SELECT 1 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 1

    SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T

    Results:

    Sum of distinct values Sum of all values

    10 12

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • I don't think that using DISTINCT is the correct way to solve this problem. DISTINCT will ignore ALL duplicate values, not just the ones caused by your join. The two main approaches for solving this problem are to

    1) Sum your values before joining to the second table.

    2) Filter the secondary table so that the join only produces one result.

    Since you haven't given details about your problem, it's impossible to say which approach is better in your situation.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Would using GROUP BY rather than DISTINCT work in this situation?

  • No, the DISTINCT masks the problem with the join creating false "duplicates" in the sum. The GROUP BY only masks this same problem in a different way. The correct way to fix this problem is to fix/replace the join so that it doesn't produce the duplicates in the first place.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Unfortunately, there is no way to fix the particular join to lose the duplicate values. I've checked every which way, and the tables I'm joining are only linked to each other one way. I was able to get around the issue by doing a select into a temp table, making a select from there while using SUM, and then dropping the temp table. Seems to be working very well, and more importantly, fast.

  • DROP TABLE #T;

    CREATE TABLE #T(Col1 INT)

    INSERT INTO #T

    SELECT 1 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 1

    --SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T

    SELECT SUM(col1)

    FROM #T

    WHERE col1 IN ( SELECT Col1

    FROM #T

    GROUP BY Col1

    HAVING COUNT(*)=1)

    ???

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Les Cardwell (6/26/2012)


    DROP TABLE #T;

    CREATE TABLE #T(Col1 INT)

    INSERT INTO #T

    SELECT 1 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 1

    --SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T

    SELECT SUM(col1)

    FROM #T

    WHERE col1 IN ( SELECT Col1

    FROM #T

    GROUP BY Col1

    HAVING COUNT(*)=1)

    ???

    That's pretty exactly what I did.

  • Les Cardwell (6/26/2012)


    DROP TABLE #T;

    CREATE TABLE #T(Col1 INT)

    INSERT INTO #T

    SELECT 1 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 1

    --SELECT SUM(DISTINCT(Col1)) AS 'Sum of distinct values',SUM(Col1) AS 'Sum of all values' FROM #T

    SELECT SUM(col1)

    FROM #T

    WHERE col1 IN ( SELECT Col1

    FROM #T

    GROUP BY Col1

    HAVING COUNT(*)=1)

    ???

    Les - When I run your query I get 9 but shouldn't it be 10 (1+2+3+4)?

    This is another way.

    ;WITH CTE AS (

    SELECT Col1,rn=ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY (SELECT NULL))

    FROM #T)

    SELECT SUM(Col1)

    FROM CTE

    WHERE rn=1


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • DVSQL (6/22/2012)


    What is the best way to sum the distinct fields only. I've got a select that returns duplicate rows due to another table being inner joined. I clean that up by using a DISTINCT. The only downside is that when I try to SUM one of the columns, it returns a sum of all rows, not just DISTINCT ones. Sure enough, that's not what I need. The only way I found to get around this issue so far is to create a temp table, load distinct values into it, and then do a select from there. I'm just afraid that since this is for a report, it might be a bad idea to have it work with temporary table. Something tells me that the execution is going to be slow, not to mention it takes up space on the server in TempDB.

    Try Common table expression instead of temp table with Row_number() function to get the distinct values and then apply sum on required column of the CTE.

    Here is sample code for you.

    ;WITH CTE AS

    (SELECT EMPID,EMPNAME,DEPID,SAL,ROW_NUMBER() OVER (PARTITION BY EMPID,EMPNAME,DEPID,SAL ORDER BY (SELECT NULL)) AS RN

    FROM SP_EMP)

    SELECT SUM(SAL)

    FROM CTE

    WHERE RN = 1

    --rhythmk
    ------------------------------------------------------------------
    To post your question use below link

    https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help
    🙂

  • Dwain,

    Since #1 is not unique, unless I misunderstood, it shouldn't be included. Only 2, 3, 4 are unique in the set.

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Les Cardwell (6/27/2012)


    Dwain,

    Since #1 is not unique, unless I misunderstood, it shouldn't be included. Only 2, 3, 4 are unique in the set.

    ~Les

    Hmmm. Must've missed that. So then this would work:

    SELECT Total=SUM(Col1)

    FROM (

    SELECT Col1

    FROM #T

    GROUP BY Col1

    HAVING COUNT(Col1) = 1) x


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I wrote a SQL CLR function to do this but if you can stay in TSQL, you're probably better off.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply