Summing distinct records

Question

Summing distinct records

kramaswamy

SSCoach

Points: 18181
More actions
May 4, 2012 at 1:29 pm

#261039

Hi all,
Just a quick question which I'm fairly sure I already know the answer to.
Let's say I have a table which stores a clientID and the purchases the client made, as well as a table that stored the clientID and his different accounts.
Also, let's say the client had multiple purchases and multiple accounts:
CREATE TABLE #ClientPurchases
(
ID INT IDENTITY PRIMARY KEY,
ClientID INT,
Purchase DECIMAL(18, 2)
)
CREATE TABLE #ClientAccounts
(
ID INT IDENTITY PRIMARY KEY,
ClientID INT,
AccountNumber BIGINT
)
INSERT INTO #ClientPurchases (ClientID, Purchase)
VALUES (1, 10)
INSERT INTO #ClientPurchases (ClientID, Purchase)
VALUES (1, 20)
INSERT INTO #ClientPurchases (ClientID, Purchase)
VALUES (1, 50)
INSERT INTO #ClientPurchases (ClientID, Purchase)
VALUES (1, 100)
INSERT INTO #ClientPurchases (ClientID, Purchase)
VALUES (2, 100)
INSERT INTO #ClientPurchases (ClientID, Purchase)
VALUES (2, 500)
INSERT INTO #ClientPurchases (ClientID, Purchase)
VALUES (2, 400)
INSERT INTO #ClientPurchases (ClientID, Purchase)
VALUES (2, 400)
INSERT INTO #ClientAccounts (ClientID, AccountNumber)
VALUES (1, 1234567)
INSERT INTO #ClientAccounts (ClientID, AccountNumber)
VALUES (1, 1512512)
INSERT INTO #ClientAccounts (ClientID, AccountNumber)
VALUES (1, 2362632)
INSERT INTO #ClientAccounts (ClientID, AccountNumber)
VALUES (1, 412421)
INSERT INTO #ClientAccounts (ClientID, AccountNumber)
VALUES (2, 235236262)
INSERT INTO #ClientAccounts (ClientID, AccountNumber)
VALUES (2, 141241)
INSERT INTO #ClientAccounts (ClientID, AccountNumber)
VALUES (2, 124124)
Now let's say I want to get a count of the total number of accounts, purchases, and the total value of the purchases. Normally I'd write a request like this:
SELECT
#ClientPurchases.ClientID,
COUNT(DISTINCT AccountNumber) AS NumAccounts,
COUNT(DISTINCT #ClientPurchases.ID) AS NumPurchases,
SUM(DISTINCT Purchase) AS AmtPurchases
FROM #ClientPurchases
JOIN #ClientAccounts ON #ClientPurchases.ClientID = #ClientAccounts.ClientID
GROUP BY #ClientPurchases.ClientID
This works for the number of accounts and purchases, but clearly doesn't work for the value of the purchases, because the SUM(DISTINCT Purchase) part is looking at a distinct value of purchases, and not a distinct purchase record.
Obviously there's ways I can get this to work, such as for example splitting the request into two:
WITH CountAccounts AS
(
SELECT
ClientID,
COUNT(*) AS NumAccounts
FROM #ClientAccounts
GROUP BY ClientID
),
ClientPurchases AS
(
SELECT
ClientID,
COUNT(*) AS NumPurchases,
SUM(Purchase) AS AmtPurchases
FROM #ClientPurchases
GROUP BY ClientID
)
SELECT
ISNULL(CountAccounts.ClientID, ClientPurchases.ClientID) AS ClientID,
ISNULL(NumAccounts, 0) AS NumAccounts,
ISNULL(NumPurchases, 0) AS NumPurchases,
ISNULL(AmtPurchases, 0) AS AmtPurchases
FROM CountAccounts
FULL JOIN ClientPurchases ON CountAccounts.ClientID = ClientPurchases.ClientID
My question is just whether there is a simpler way of doing this that I'm forgetting about, since my real query is considerably bigger and I'd rather not take the time to figure out how to split it up. Basically is there a simple way I can get a SUM of distinct values, but where the "distinct-itude" is not decided by the value, but rather by a different key.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

Unfortunately not that I'm aware of. Because the Join will distribute the two halves of the join into both counts, you have to use correllated subqueries (basically the CTE you built) for each value.

Personally, I'd probably do something like:

SELECT

clientID,

(SELECT COUNT(*) FROM #ClientAccount WHERE ClientID = d.ClientID)

...

FROM

(SELECT DISTINCT ClientID FROM tblClients) AS d

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Based on your sample data, what should the expected output look like? Manually build us a table that represents the expected results.

kramaswamy SSCoach Points: 18181 More actions · Answer 3

The second query provides the correct results from my example set, but here they are anyway:

ClientIDNumAccountsNumPurchasesAmtPurchases

144180.00

2341400.00

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

I would do this with a correlated subquery, thusly:

SELECT ClientID

,(SELECT COUNT(AccountNumber)

FROM #ClientAccounts ca

WHERE ca.ClientID = cp.ClientID

GROUP BY ClientID) AS NumAccounts

,NumPurchases, AmtPurchases

FROM (

SELECT ClientID, COUNT(ClientID) AS NumPurchases, SUM(Purchase) AS AmtPurchases

FROM #ClientPurchases

GROUP BY ClientID) cp

IMHO, this version looks somewhat simpler than your version that you said works and its query plan cost shows less. Actual performance may vary though so run a timing test to be sure.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86978 More actions · Answer 5

Actually, I have a better idea. If you need to worry about ClientIDs that have accounts but no purchases (the above query won't work), use this.

;WITH CTE (ClientID, AccountNumber, NumPurchases, AmtPurchases) AS (

SELECT ClientID, 1, 0, 0 FROM #ClientAccounts

UNION ALL

SELECT ClientID, 0, 1, Purchase FROM #ClientPurchases

)

SELECT ClientID, SUM(AccountNumber) AS NumAccounts

,SUM(NumPurchases) AS NumPurchases, SUM(AmtPurchases) AS AmtPurchases

FROM CTE

GROUP BY ClientID

Still pretty simple and has the same query plan cost as my prior suggestion.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St