Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

Question

Post reply

Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

Viewing 15 posts - 136 through 150 (of 166 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 1

robinwilson (6/7/2014)
Hello All
I'm just wondering, is there anything wrong with doing it this way:

There is no guarantee that customer with both product A and B are returned.

N 56°04'39.16"
E 12°55'05.25"

Robin Wilson SSCertifiable Points: 5017 More actions · Answer 2

Robin Wilson

SSCertifiable

Points: 5017

June 7, 2014 at 9:07 am

#1719903

Thanks, I missed that part!

gbritton1 SSCertifiable Points: 6522 More actions · Answer 3

My variant on this uses count instead of sum. Since count() ignores null values, you don't need the else part of the case:

select CustomerId

from #Purchase

where ProductCode in ('A', 'B', 'C')

group by CustomerId

having count(case ProductCode when 'A' then 1 end) > 0

and count(case ProductCode when 'B' then 1 end) > 0

and count(case ProductCode when 'C' then 1 end) = 0

for me it's a bit more intuitive but that's purely subjective.

mushin2003 Newbie Points: 5 More actions · Answer 4

Hi,

I am new to SQL and tried to solve this in my own way. In one of my approaches I first wanted to get a list of the customers who had purchased both A&B using this query.

Select DISTINCT p1.CustomerID From #Purchase p1

join #Purchase p2 On p1.ProductCOde = 'A' and p2.ProductCOde = 'B';

Using the data provided, I am getting an "incorrect" result from what I was expecting. I am getting 6 rows returned with the unexpected row customerID = 6. Can someone tell me why SQL Server returned that row?

My novice approach for the answer to the original question turned out to be:

Select p1.CustomerID from

( Select DISTINCT p1.CustomerID From #Purchase p1

join #Purchase p2 On (p1.ProductCOde = 'A' and p2.ProductCOde = 'B' and p1.CustomerID = p2.CustomerID)

) as p1

where p1.CustomerID NOT IN (Select CustomerID from #Purchase p3 where p3.ProductCOde = 'C')

TIA,

Rob

Luis Cazares SSC Guru Points: 183706 More actions · Answer 5

Hi Rob,

The problem with your first query is that you missed to add the join condition for the CustomerID. In result, you're getting a cross join between all the rows with ProductCode A and all the rows with ProductCode B, and in the end you get all customers who bought product 'A' (as long as someone bought product B).

The weird thing is that you corrected that problem in your second query to get the correct result. Congratulations on getting it right!

I'd encourage you to read the discussion on this topic to review and understand the different options given. It will help you in your learning.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

mushin2003 Newbie Points: 5 More actions · Answer 6

Luis Cazares (6/11/2014)
Hi Rob,
The problem with your first query is that you missed to add the join condition for the CustomerID. In result, you're getting a cross join between all the rows with ProductCode A and all the rows with ProductCode B, and in the end you get all customers who bought product 'A' (as long as someone bought product B).
The weird thing is that you corrected that problem in your second query to get the correct result. Congratulations on getting it right!
I'd encourage you to read the discussion on this topic to review and understand the different options given. It will help you in your learning.

Thks Luis!

I also saw adding a WHERE clause:

Select DISTINCT p1.CustomerID From #Purchase p1

join #Purchase p2 On p1.ProductCOde = 'A' and p2.ProductCOde = 'B'

where p1.CustomerID = p2.CustomerID;

would give the expected result. Yes, I am going thru and looking at the different solutions. Learning how to do the benchmark as well.

Rob

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

mushin2003 (6/11/2014)
Luis Cazares (6/11/2014)
Hi Rob,
The problem with your first query is that you missed to add the join condition for the CustomerID. In result, you're getting a cross join between all the rows with ProductCode A and all the rows with ProductCode B, and in the end you get all customers who bought product 'A' (as long as someone bought product B).
The weird thing is that you corrected that problem in your second query to get the correct result. Congratulations on getting it right!
I'd encourage you to read the discussion on this topic to review and understand the different options given. It will help you in your learning.
Thks Luis!
I also saw adding a WHERE clause:
Select DISTINCT p1.CustomerID From #Purchase p1
join #Purchase p2 On p1.ProductCOde = 'A' and p2.ProductCOde = 'B'
where p1.CustomerID = p2.CustomerID;
would give the expected result. Yes, I am going thru and looking at the different solutions. Learning how to do the benchmark as well.
Rob

That's an interesting choice. Even if it will work exactly the same for inner joins, my advice is to go the other way to keep things where they belong within the logic.

Select DISTINCT p1.CustomerID

From #Purchase p1

join #Purchase p2 On p1.CustomerID = p2.CustomerID --Keep join conditions within the join

where p1.ProductCOde = 'A' --Keep query filters on WHERE

and p2.ProductCOde = 'B';

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 8

simonc 39536 (6/6/2014)
I think a lot of people seem to focus purely on performance, a good solution to a given issue considers many factors, NOT just performance, people who focus on just the performance aspect tend to write vastly more code and it eventually gets messy, ...

Just as a contrasting note to that...

It's an unwritten rule that one should always focus on accuracy first, and then performance so I don't believe for a seond that a lot of people focus purely on performance. For me, performance (and the related scalability) is second only to accuracy and not by much (although I'm not one to try to shave a millisecond off of a 5ms proc unless it's really needed for very high volume). I've also found that high performance code is usually much shorter than performance challenged code and is actually quite a bit more readable. I do agree that there are exceptions to that rule (greatest of 3 columns is a good example of that) but I won't sacrifice performance just because someone might not be able to understand the code (that's why I use embedded comments) or to make the code shorter (although it usually is).

Last but not least, I've also found that high performance code almost always uses fewer resources (sometimes, drastically so) and that makes it worth it, as well. I spend about 25% of my time fixing other peoples' code to make it faster and less resource intensive. Oddly enough, I also end up making it quite a bit more readable and supportable in the process.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mushin2003 Newbie Points: 5 More actions · Answer 9

Are there any recommended books or resource which explain the internal workings of SQL Server engine(2008, 2012)? Which for example explains why Jeff's answer scales better than the solution provided using INTERCEPT/EXCEPT. I actually found it to be one of the easier solutions to follow.

Thanks,

Rob

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 10

It boils down to the number of IO used for the solution.

Also the CPU usage matters.

Obviously scanning the table twice (using INTERCEPT/EXCEPT for example) uses twice as much IO than scanning the table just once. There are factors that changes this such as proper indexes.

N 56°04'39.16"
E 12°55'05.25"

The Wizard Of Oz SSC Eights! Points: 875 More actions · Answer 11

For a miniscule (and mostly academic) improvement on PeSo's script,

I've got it so the index/table only gets scanned once (instead of 3 times)

by just using a BETWEEN instead of the IN (which the optimiser auto-expands to 3 OR conditions):

SELECT CustomerID

FROM #Purchase

WHERE ProductCode BETWEEN 'A' AND 'C'

GROUP BY CustomerID

HAVING MIN(ProductCode) = 'A'

AND MAX(ProductCode) = 'B'

STATISTICS IO Before: ...Scan count 3, logical reads 225...

STATISTICS IO After: ...Scan count 1, logical reads 217...

Tested on the million row random table from Jeff's article.

Luis Cazares SSC Guru Points: 183706 More actions · Answer 12

The Wizard Of Oz (6/12/2014)
For a miniscule (and mostly academic) improvement on PeSo's script,
I've got it so the index/table only gets scanned once (instead of 3 times)
by just using a BETWEEN instead of the IN (which the optimiser auto-expands to 3 OR conditions):
SELECT CustomerID
FROM #Purchase
WHERE ProductCode BETWEEN 'A' AND 'C'
GROUP BY CustomerID
HAVING MIN(ProductCode) = 'A'
AND MAX(ProductCode) = 'B'
STATISTICS IO Before: ...Scan count 3, logical reads 225...
STATISTICS IO After: ...Scan count 1, logical reads 217...
Tested on the million row random table from Jeff's article.

But that won't work to find customers that bought A & C but not B. 😀

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

The Wizard Of Oz SSC Eights! Points: 875 More actions · Answer 13

Luis Cazares (6/12/2014)
But that won't work to find customers that bought A & C but not B. 😀

Haha! Yes Luis, good thing that A & C not B wasn't in the scope of the original article 😛

Luis Cazares SSC Guru Points: 183706 More actions · Answer 14

The Wizard Of Oz (6/12/2014)
Luis Cazares (6/12/2014)
But that won't work to find customers that bought A & C but not B. 😀
Haha! Yes Luis, good thing that A & C not B wasn't in the scope of the original article 😛

"I have the answer, but it only works for spherical chickens in a vaccuum.":-D

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Jonathan AC Roberts SSCoach Points: 18361 More actions · Answer 15

Normally when an example is given using values A, B and C they should be taken as algebraic variables who's contents can change. More like this example:

DECLARE @a char(1), @b-2 char(1), @C char(1)

SELECT @a='A', @b-2='B', @C='C'

SELECT DISTINCT CustomerID

FROM #Purchase P1

WHERE P1.ProductCode = @a

AND EXISTS(SELECT *

FROM #Purchase P2

WHERE P2.ProductCode = @b-2

AND P2.CustomerID = P1.CustomerId)

AND NOT EXISTS(SELECT *

FROM #Purchase P3

WHERE P3.ProductCode = @C

AND P3.CustomerID = P1.CustomerId)

As soon as you start substituting values in a lot of the solutions given in this thread they would stop working.

e.g:

SELECT @a='C', @b-2='B', @C='A'