June 7, 2014 at 8:00 am
robinwilson (6/7/2014)
Hello AllI'm just wondering, is there anything wrong with doing it this way:
There is no guarantee that customer with both product A and B are returned.
N 56°04'39.16"
E 12°55'05.25"
June 7, 2014 at 9:07 am
Thanks, I missed that part!
June 9, 2014 at 6:22 am
My variant on this uses count instead of sum. Since count() ignores null values, you don't need the else part of the case:
select CustomerId
from #Purchase
where ProductCode in ('A', 'B', 'C')
group by CustomerId
having count(case ProductCode when 'A' then 1 end) > 0
and count(case ProductCode when 'B' then 1 end) > 0
and count(case ProductCode when 'C' then 1 end) = 0
for me it's a bit more intuitive but that's purely subjective.
June 11, 2014 at 11:54 am
Hi,
I am new to SQL and tried to solve this in my own way. In one of my approaches I first wanted to get a list of the customers who had purchased both A&B using this query.
Select DISTINCT p1.CustomerID From #Purchase p1
join #Purchase p2 On p1.ProductCOde = 'A' and p2.ProductCOde = 'B';
Using the data provided, I am getting an "incorrect" result from what I was expecting. I am getting 6 rows returned with the unexpected row customerID = 6. Can someone tell me why SQL Server returned that row?
My novice approach for the answer to the original question turned out to be:
Select p1.CustomerID from
( Select DISTINCT p1.CustomerID From #Purchase p1
join #Purchase p2 On (p1.ProductCOde = 'A' and p2.ProductCOde = 'B' and p1.CustomerID = p2.CustomerID)
) as p1
where p1.CustomerID NOT IN (Select CustomerID from #Purchase p3 where p3.ProductCOde = 'C')
TIA,
Rob
June 11, 2014 at 12:31 pm
Hi Rob,
The problem with your first query is that you missed to add the join condition for the CustomerID. In result, you're getting a cross join between all the rows with ProductCode A and all the rows with ProductCode B, and in the end you get all customers who bought product 'A' (as long as someone bought product B).
The weird thing is that you corrected that problem in your second query to get the correct result. Congratulations on getting it right!
I'd encourage you to read the discussion on this topic to review and understand the different options given. It will help you in your learning.
June 11, 2014 at 1:39 pm
Luis Cazares (6/11/2014)
Hi Rob,The problem with your first query is that you missed to add the join condition for the CustomerID. In result, you're getting a cross join between all the rows with ProductCode A and all the rows with ProductCode B, and in the end you get all customers who bought product 'A' (as long as someone bought product B).
The weird thing is that you corrected that problem in your second query to get the correct result. Congratulations on getting it right!
I'd encourage you to read the discussion on this topic to review and understand the different options given. It will help you in your learning.
Thks Luis!
I also saw adding a WHERE clause:
Select DISTINCT p1.CustomerID From #Purchase p1
join #Purchase p2 On p1.ProductCOde = 'A' and p2.ProductCOde = 'B'
where p1.CustomerID = p2.CustomerID;
would give the expected result. Yes, I am going thru and looking at the different solutions. Learning how to do the benchmark as well.
Rob
June 11, 2014 at 1:48 pm
mushin2003 (6/11/2014)
Luis Cazares (6/11/2014)
Hi Rob,The problem with your first query is that you missed to add the join condition for the CustomerID. In result, you're getting a cross join between all the rows with ProductCode A and all the rows with ProductCode B, and in the end you get all customers who bought product 'A' (as long as someone bought product B).
The weird thing is that you corrected that problem in your second query to get the correct result. Congratulations on getting it right!
I'd encourage you to read the discussion on this topic to review and understand the different options given. It will help you in your learning.
Thks Luis!
I also saw adding a WHERE clause:
Select DISTINCT p1.CustomerID From #Purchase p1
join #Purchase p2 On p1.ProductCOde = 'A' and p2.ProductCOde = 'B'
where p1.CustomerID = p2.CustomerID;
would give the expected result. Yes, I am going thru and looking at the different solutions. Learning how to do the benchmark as well.
Rob
That's an interesting choice. Even if it will work exactly the same for inner joins, my advice is to go the other way to keep things where they belong within the logic.
Select DISTINCT p1.CustomerID
From #Purchase p1
join #Purchase p2 On p1.CustomerID = p2.CustomerID --Keep join conditions within the join
where p1.ProductCOde = 'A' --Keep query filters on WHERE
and p2.ProductCOde = 'B';
June 11, 2014 at 4:20 pm
simonc 39536 (6/6/2014)
I think a lot of people seem to focus purely on performance, a good solution to a given issue considers many factors, NOT just performance, people who focus on just the performance aspect tend to write vastly more code and it eventually gets messy, ...
Just as a contrasting note to that...
It's an unwritten rule that one should always focus on accuracy first, and then performance so I don't believe for a seond that a lot of people focus purely on performance. For me, performance (and the related scalability) is second only to accuracy and not by much (although I'm not one to try to shave a millisecond off of a 5ms proc unless it's really needed for very high volume). I've also found that high performance code is usually much shorter than performance challenged code and is actually quite a bit more readable. I do agree that there are exceptions to that rule (greatest of 3 columns is a good example of that) but I won't sacrifice performance just because someone might not be able to understand the code (that's why I use embedded comments) or to make the code shorter (although it usually is).
Last but not least, I've also found that high performance code almost always uses fewer resources (sometimes, drastically so) and that makes it worth it, as well. I spend about 25% of my time fixing other peoples' code to make it faster and less resource intensive. Oddly enough, I also end up making it quite a bit more readable and supportable in the process.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 12, 2014 at 5:32 am
Are there any recommended books or resource which explain the internal workings of SQL Server engine(2008, 2012)? Which for example explains why Jeff's answer scales better than the solution provided using INTERCEPT/EXCEPT. I actually found it to be one of the easier solutions to follow.
Thanks,
Rob
June 12, 2014 at 8:24 am
It boils down to the number of IO used for the solution.
Also the CPU usage matters.
Obviously scanning the table twice (using INTERCEPT/EXCEPT for example) uses twice as much IO than scanning the table just once. There are factors that changes this such as proper indexes.
N 56°04'39.16"
E 12°55'05.25"
June 12, 2014 at 8:47 am
For a miniscule (and mostly academic) improvement on PeSo's script,
I've got it so the index/table only gets scanned once (instead of 3 times)
by just using a BETWEEN instead of the IN (which the optimiser auto-expands to 3 OR conditions):
SELECT CustomerID
FROM #Purchase
WHERE ProductCode BETWEEN 'A' AND 'C'
GROUP BY CustomerID
HAVING MIN(ProductCode) = 'A'
AND MAX(ProductCode) = 'B'
STATISTICS IO Before: ...Scan count 3, logical reads 225...
STATISTICS IO After: ...Scan count 1, logical reads 217...
Tested on the million row random table from Jeff's article.
June 12, 2014 at 9:06 am
The Wizard Of Oz (6/12/2014)
For a miniscule (and mostly academic) improvement on PeSo's script,I've got it so the index/table only gets scanned once (instead of 3 times)
by just using a BETWEEN instead of the IN (which the optimiser auto-expands to 3 OR conditions):
SELECT CustomerID
FROM #Purchase
WHERE ProductCode BETWEEN 'A' AND 'C'
GROUP BY CustomerID
HAVING MIN(ProductCode) = 'A'
AND MAX(ProductCode) = 'B'
STATISTICS IO Before: ...Scan count 3, logical reads 225...
STATISTICS IO After: ...Scan count 1, logical reads 217...
Tested on the million row random table from Jeff's article.
But that won't work to find customers that bought A & C but not B. 😀
June 12, 2014 at 9:13 am
Luis Cazares (6/12/2014)
But that won't work to find customers that bought A & C but not B. 😀
Haha! Yes Luis, good thing that A & C not B wasn't in the scope of the original article 😛
June 12, 2014 at 9:17 am
The Wizard Of Oz (6/12/2014)
Luis Cazares (6/12/2014)
But that won't work to find customers that bought A & C but not B. 😀Haha! Yes Luis, good thing that A & C not B wasn't in the scope of the original article 😛
"I have the answer, but it only works for spherical chickens in a vaccuum.":-D
June 12, 2014 at 9:40 am
Normally when an example is given using values A, B and C they should be taken as algebraic variables who's contents can change. More like this example:
DECLARE @a char(1), @b-2 char(1), @C char(1)
SELECT @a='A', @b-2='B', @C='C'
SELECT DISTINCT CustomerID
FROM #Purchase P1
WHERE P1.ProductCode = @a
AND EXISTS(SELECT *
FROM #Purchase P2
WHERE P2.ProductCode = @b-2
AND P2.CustomerID = P1.CustomerId)
AND NOT EXISTS(SELECT *
FROM #Purchase P3
WHERE P3.ProductCode = @C
AND P3.CustomerID = P1.CustomerId)
As soon as you start substituting values in a lot of the solutions given in this thread they would stop working.
e.g:
Viewing 15 posts - 136 through 150 (of 166 total)
You must be logged in to reply to this topic. Login to reply