COUNT(*) vs SUM()

Question

COUNT(*) vs SUM()

Mike Austin-398977

Say Hey Kid

Points: 697
More actions
August 25, 2017 at 2:16 pm

#402440

All,
I've been asked to improve the performance of a moderately complicated query. Here's the very simplified query:
SELECT
    EmployeeID
    ,NULL Referrals
    ,COUNT(*) Booked
    ,SUM(BookedLoanAmount) BookedLoanAmount
FROM vwMaxMSRReferralResponse
-- JOINS to 4 other tables
GROUP BY EmployeeID
UNION
SELECT
    EmployeeID
    ,COUNT(*) Referrals
    ,NULL Booked
    ,NULL BookedLoanAmount
FROM vwMaxMSRReferralResponse
-- JOINS to 4 other tables
GROUP BY EmployeeID
vwMaxMSRReferralResponse performs a Row_Number / Over / Partition query on a second view that joins 15+ tables and contains a UNION ALL. It it out of bounds and can't be changed. My approach was to get the vwMaxMSRReferralResponse columns in a virtual table and then join the results to the other 4 tables only once instead of twice. (The joins include table scans, so I thought if I only do the table scans once that will improve the performance.
Here's my simplified query:
SELECT
    EmployeeID
    , SUM(Referrals) AS Referrals
    , SUM(Booked) AS Booked
    , SUM(BookedLoanAmount) AS BookedLoanAmount
FROM (
    SELECT
        EmployeeID
        , NULL Referrals
        , 1 AS Booked
        , BookedLoanAmount
    FROM vwMaxMSRReferralResponse r
    UNION
    SELECT
        EmployeeID
        , 1 AS Referrals
        , NULL Booked
        , NULL BookedLoanAmount
    FROM vwMaxMSRReferralResponse
) a
-- Joins to 4 other tables
GROUP BY EmployeeID
Aside from the different way of handling the joins, the new query uses SUM instead of COUNT. That is the reason for my post, as the numbers don't match up. I'm pretty sure I'm missing something obvious here...
TIA,
Mike

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

doug.brown SSCertifiable Points: 5608 More actions · Answer 1

It's hard to say without knowing your data, but COUNT and SUM do two different things. Let's say we have three values in Booked: 0, 1, and 0. The COUNT of those will be 3. The SUM will be 1.

Mike Austin-398977 Say Hey Kid Points: 697 More actions · Answer 2

Thanks for the reply. In the original query, COUNT(*) was used to sum the # of referrals and booked for each employee. This wouldn't have worked with the rewrite because I would have to group by in each of my union queries and my goal was to minimize the use of a complex view with it's attendant joins. I wanted to perform my calculations at a higher level. My way performed MUCH faster; it just doesn't return results that match the original. 🙁

I was hoping for someone to point out a "gotcha" with using COUNT(*) rather than the approach I took.

Eddie Wuerch SSChampion Points: 12543 More actions · Answer 3

That UNION is simply running the same query twice, and will produce them same value each time.
I'm guessing there's a WHERE clause you forgot. If all the data for the two different queries in the UNION is in the same place, then only go after if once.
Using CASE WHEN, you can taker different actions on rows based on conditions.

For example, if there's a 'type' column of some sort in that table that is used to tell the difference between a referral and a booking,
then that column can be used to conditionally count or sum a different value in the row, which is what is shown below

WITH BookedAmounts AS( SELECT EmployeeID, SUM(CASE WHEN ReferralResponseType='Referral' THEN 1 ELSE 0 END) AS Referrals, -- or whatever determines that SUM(CASE WHEN ReferralResponseType='Booked' THEN 1 ELSE 0 END) AS Booked, SUM(BookedLoanAmount) AS BookedLoanAmount GROUP BY EmployeeID FROM vwMaxMSRReferralResponse ) SELECT ba.EmployeeID, ba.Referrals, ba.BookedLoadAmount, <columns from other tables> FROM BookedAmounts AS ba INNER JOIN ... <other tables> ON <other table>.<other column> = be.EmloyeeID;

Eddie Wuerch
MCM: SQL

Mike Austin-398977 Say Hey Kid Points: 697 More actions · Answer 4

Thanks for the response. Referrals are for tellers, booked is for CSRs. We're going with creating and populating a table specifically for the reports that use this process and expect performance to improve significantly.