Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

Question

Post reply

Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

Viewing 15 posts - 76 through 90 (of 166 total)

You must be logged in to reply to this topic. Login to reply

spam 17205 Newbie Points: 6 More actions · Answer 1

This one gives you even more flexibility to filter on/output counts of each product:

SELECT CustomerID, cnt_A, cnt_B, cnt_C

FROM (

SELECT

CustomerID

, SUM(CASE ProductCode WHEN 'A' THEN 1 ELSE 0 END) AS cnt_A

, SUM(CASE ProductCode WHEN 'B' THEN 1 ELSE 0 END) AS cnt_B

, SUM(CASE ProductCode WHEN 'C' THEN 1 ELSE 0 END) AS cnt_C

FROM #Purchase

WHERE ProductCode IN ('A', 'B', 'C')

GROUP BY CustomerID

) t

WHERE cnt_A > 0 AND cnt_B > 0 AND cnt_C = 0

mark hutchinson SSC Eights! Points: 878 More actions · Answer 2

@jeff

Thanks for testing.

@spam 17205

I think a simpler version might be

SELECT

CustomerID

, SUM(CASE ProductCode WHEN 'A' THEN 1 ELSE 0 END) AS cnt_A

, SUM(CASE ProductCode WHEN 'B' THEN 1 ELSE 0 END) AS cnt_B

, SUM(CASE ProductCode WHEN 'C' THEN 1 ELSE 0 END) AS cnt_C

FROM #Purchase

WHERE ProductCode IN ('A', 'B', 'C')

GROUP BY CustomerID

Having cnt_A > 0 AND cnt_B > 0 AND cnt_C = 0

Or to let the inner/initial query just do the string-to-number translation.

SELECT CustomerID

, SUM(Case_A) AS cnt_A

, SUM(Case_B) AS cnt_B

, SUM(Case_C) AS cnt_C

FROM (

SELECT

CustomerID

, CASE ProductCode WHEN 'A' THEN 1 ELSE 0 END AS Case_A

, CASE ProductCode WHEN 'B' THEN 1 ELSE 0 END AS Case_B

, CASE ProductCode WHEN 'C' THEN 1 ELSE 0 END AS Case_C

FROM #Purchase

WHERE ProductCode IN ('A', 'B', 'C')

) t

WHERE cnt_A <> 0 AND cnt_B <> 0 AND cnt_C = 0

GROUP BY CustomerID

p oelkers SSC Rookie Points: 33 More actions · Answer 3

select customer,

sum(case where product = 'A' then 1 else 0 end) as A,

sum(case where product = 'B' then 1 else 0 end) as B,

sum(case where product = 'C' then 1 else 0 end) as C

from

group by customer

having sum(case where product = 'A' then 1 else 0 end) > 1

and sum(case where product = 'B' then 1 else 0 end) > 1

and sum(case where product = 'C' then 1 else 0 end) = 0

Berl SSC Veteran Points: 297 More actions · Answer 4

Might I suggest another, old school, alternative that seems to run a little bit faster, which uses a sub-query to eliminate those customers who all ready purchased item "C".

Although CPU time difference is not very great, the difference is much bigger for the elapsed time.

SELECT CustomerID

FROM #Purchase

WHERE ProductCode IN ('A','B')

AND CustomerID NOT IN (SELECT DISTINCT CustomerID FROM #Purchase AS P WHERE ProductCode IN ('C'))

GROUP BY CustomerID

HAVING COUNT(DISTINCT ProductCode) = 2

Here are the statistics:

Find Customers that did not buy "C" from article

(6673 row(s) affected)

SQL Server Execution Times:

CPU time = 94 ms, elapsed time = 265 ms.

Find Customers that did not buy "C" using subquery

(6673 row(s) affected)

SQL Server Execution Times:

CPU time = 93 ms, elapsed time = 202 ms.

Dean Cochrane SSC Eights! Points: 992 More actions · Answer 5

Berl (3/30/2012)
Although CPU time difference is not very great, the difference is much bigger for the elapsed time.

Elapsed time includes wait time and is not a good measure of performance.

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 6

Dean Cochrane (3/30/2012)
Berl (3/30/2012)
Although CPU time difference is not very great, the difference is much bigger for the elapsed time.

Elapsed time includes wait time and is not a good measure of performance.

Actually, it is. The wait time you speak of also includes I/O time and that can be a major part of a performance problem. I do agree, though, that you really need to take "time to display" out of the picture because it will severly mask problems. It takes a certain amount of time to display a million rows and can make differences between two methods seem much more trivial than they might be.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Berl SSC Veteran Points: 297 More actions · Answer 7

Berl

SSC Veteran

Points: 297

March 31, 2012 at 10:43 pm

#1467377

Thank you Jeff.

michal.lisinski SSCertifiable Points: 7201 More actions · Answer 8

Imho

SELECT DISTINCT CustomerID

FROM [#Purchase]

WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN

(SELECT DISTINCT CustomerID

FROM [#Purchase]

WHERE (ProductCode = 'C')))

GROUP BY CustomerID, ProductCode

HAVING (COUNT(CustomerID) = 1)

Regards

Mike

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 9

michal.lisinski (4/2/2012)
Imho
SELECT DISTINCT CustomerID
FROM [#Purchase]
WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN
(SELECT DISTINCT CustomerID
FROM [#Purchase]
WHERE (ProductCode = 'C')))
GROUP BY CustomerID, ProductCode
HAVING (COUNT(CustomerID) = 1)
Regards
Mike

This also will select customers who have purchased just one of 'A' or 'B' and not purchased 'C'. the requirement was that they should have purchased both 'A' and 'B' and not purchased 'C'.

Antares686 SSC Guru Points: 125444 More actions · Answer 10

Jeff Moden (3/31/2012)
Dean Cochrane (3/30/2012)
Berl (3/30/2012)
Although CPU time difference is not very great, the difference is much bigger for the elapsed time.

Elapsed time includes wait time and is not a good measure of performance.
Actually, it is. The wait time you speak of also includes I/O time and that can be a major part of a performance problem. I do agree, though, that you really need to take "time to display" out of the picture because it will severly mask problems. It takes a certain amount of time to display a million rows and can make differences between two methods seem much more trivial than they might be.

Well depends, the elapsed time can an ok performance indicator, however when you run the queries for testing back to back in the same batch without clearing the buffers you may get a false sense of performance from elapsed time and many other stats. I have a local test SQL Server which I always test by stopping the instance then starting so I can be sure everything is clear for testing. But the query plans, so I would be carefull in saying one scenario over another is truely best if you compare in a single batch. Now all that said, a huge difference in elapsed time shows a performance change and when I say huge took 30minutes to run first time then 30 seconds with second query.

michal.lisinski SSCertifiable Points: 7201 More actions · Answer 11

Hmm, r u sure? Customer no 6 took product A and no B, customer 7 took product B and no A, both didn't take C, but query doesn't return them.

Regards

Mike

michal.lisinski SSCertifiable Points: 7201 More actions · Answer 12

Jonathan AC Roberts (4/2/2012)
michal.lisinski (4/2/2012)
Imho
SELECT DISTINCT CustomerID
FROM [#Purchase]
WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN
(SELECT DISTINCT CustomerID
FROM [#Purchase]
WHERE (ProductCode = 'C')))
GROUP BY CustomerID, ProductCode
HAVING (COUNT(CustomerID) = 1)
Regards
Mike
This also will select customers who have purchased just one of 'A' or 'B' and not purchased 'C'. the requirement was that they should have purchased both 'A' and 'B' and not purchased 'C'.

michal.lisinski SSCertifiable Points: 7201 More actions · Answer 13

Jonathan AC Roberts (4/2/2012)
michal.lisinski (4/2/2012)
Imho
SELECT DISTINCT CustomerID
FROM [#Purchase]
WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN
(SELECT DISTINCT CustomerID
FROM [#Purchase]
WHERE (ProductCode = 'C')))
GROUP BY CustomerID, ProductCode
HAVING (COUNT(CustomerID) = 1)
Regards
Mike
This also will select customers who have purchased just one of 'A' or 'B' and not purchased 'C'. the requirement was that they should have purchased both 'A' and 'B' and not purchased 'C'.

Hmm, r u sure? Customer no 6 took product A and no B, customer 7 took product B and no A, both didn't take C, but query doesn't return them.

Regards

Mike

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 14

michal.lisinski (4/2/2012)
Jonathan AC Roberts (4/2/2012)
michal.lisinski (4/2/2012)
Imho
SELECT DISTINCT CustomerID
FROM [#Purchase]
WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN
(SELECT DISTINCT CustomerID
FROM [#Purchase]
WHERE (ProductCode = 'C')))
GROUP BY CustomerID, ProductCode
HAVING (COUNT(CustomerID) = 1)
Regards
Mike
This also will select customers who have purchased just one of 'A' or 'B' and not purchased 'C'. the requirement was that they should have purchased both 'A' and 'B' and not purchased 'C'.
Hmm, r u sure? Customer no 6 took product A and no B, customer 7 took product B and no A, both didn't take C, but query doesn't return them.
Regards
Mike

Hi Mike,

I tried the following on your code:

;WITH [#Purchase] AS

(

SELECT 'A' ProductCode, 1 CustomerId UNION ALL

SELECT 'B' ProductCode, 1 CustomerId UNION ALL

SELECT 'A' ProductCode, 2 CustomerId UNION ALL

SELECT 'B' ProductCode, 3 CustomerId UNION ALL

SELECT 'A' ProductCode, 4 CustomerId UNION ALL

SELECT 'A' ProductCode, 5 CustomerId UNION ALL

SELECT 'B' ProductCode, 5 CustomerId UNION ALL

SELECT 'C' ProductCode, 5 CustomerId

)

SELECT DISTINCT CustomerID

FROM [#Purchase]

WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN

(SELECT DISTINCT CustomerID

FROM [#Purchase]

WHERE (ProductCode = 'C')))

GROUP BY CustomerID, ProductCode

HAVING (COUNT(CustomerID) = 1)

Customer 1 has bought 'A' and 'B' and not 'C' and he gets found correctly

Customer 2 has bought 'A' and not 'B' and not 'C' and he gets found incorrectly

Customer 3 has bought not 'A' and 'B' and not 'C' and he gets found incorrectly

Customer 4 has bought two 'A' and not 'B' and not 'C' and he doesn't get found correctly

Customer 5 has bought 'A' and 'B' and 'C' and he doesn't get found correctly

Gullimeel Hall of Fame Points: 3871 More actions · Answer 15

select * from

(

select id,max(case when cd = 'A' then 1 else 0 end) + max(case when cd = 'B' then 2 else 0 end) + max(case when cd = 'C' then 4 else 0 end) status from a

--where cd in ('A','B','C')

group by id

) dta

where

1=1

--and status = 3 -- a,b but not c

--and status = 5--does have a,c but not b

--and status = 6--does have b,c but not a

--and status = 7 --all 3

--and status = 0 --none of them

--and status = 1 -- just A

--and status = 2 -- just b

and status = 4 -- just C

Replace Id with CustomerId and cd with ProductCode. You can generalize the code the way you want..You can use analytic function instead of group by as well.

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]