March 29, 2012 at 9:39 am
I see that most examples are accessing the same table more than once (IN/NOT IN, EXISTS/NOT EXISTS or JOINs). I wonder if it's faster than acessing the table only once and then, using HAVING to SUM() three times.
Is there any particular reason to avoid NOT IN or NOT EXISTS in this case?
Best regards,
Andre Guerreiro Neto
Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2/MCSE/MCSA
March 29, 2012 at 10:40 am
Great Article! Thanks.
March 29, 2012 at 10:52 am
ColdCoffee (3/28/2012)
tommyh (3/28/2012)
Wouldnt something like this work just as well?
SELECT
CustomerId
FROM #Purchase
GROUP BY CustomerID
having sum(case when ProductCode = 'A' then 1 else 0 end) > 0
and sum(case when ProductCode = 'B' then 1 else 0 end) > 0
and sum(case when ProductCode = 'C' then 1 else 0 end) = 0
/T
Yep, this is how i once implemented it! Cant remember the thread now.
Here it is: http://www.sqlservercentral.com/Forums/FindPost1267224.aspx
Proved to elimate the EXCEPT part; also does a single scan on the table.
Was going to say that one as well. And the stats come up better when I test. Also, becuase the having clause meets the 3 required conditions you don't need the WHERE clause so tweak it there.
March 29, 2012 at 11:00 am
Nice, simple, and informative (especially the EXCEPT which I never use). the alternatives were also interesting. 🙂
March 29, 2012 at 11:08 am
Antares686 (3/29/2012)
ColdCoffee (3/28/2012)
tommyh (3/28/2012)
Wouldnt something like this work just as well?
SELECT
CustomerId
FROM #Purchase
GROUP BY CustomerID
having sum(case when ProductCode = 'A' then 1 else 0 end) > 0
and sum(case when ProductCode = 'B' then 1 else 0 end) > 0
and sum(case when ProductCode = 'C' then 1 else 0 end) = 0
/T
Yep, this is how i once implemented it! Cant remember the thread now.
Here it is: http://www.sqlservercentral.com/Forums/FindPost1267224.aspx
Proved to elimate the EXCEPT part; also does a single scan on the table.
Was going to say that one as well. And the stats come up better when I test. Also, becuase the having clause meets the 3 required conditions you don't need the WHERE clause so tweak it there.
Interesting fact here which can be explained by filtering of data I need by keeping the WHERE clause has a big impact on the query.
WHERE ProductCode In ('A','B','C')
Also, of note I tried the EXCEPT version and the EXISTS/NOT EXISTS using the process to generate a lot more rows but did not add the index on purpose and the EXCEPT and HAVING both came in much better over EXISTS, which complained in the Execution plan about a missing index. The other two did not complain.
March 29, 2012 at 12:16 pm
Or you could do
Select Distinct ABuyers.CustomerID from
(Select CustomerID FROM #Purchase WHERE ProductCode = 'A') ABuyers
JOIN
(Select CustomerID FROM #Purchase WHERE ProductCode = 'B') BBuyers
ON ABuyers.CustomerID = BBuyers.CustomerID
LEFT JOIN
(Select CustomerID FROM #Purchase WHERE ProductCode = 'C') CBuyers
ON ABuyers.CustomerID = CBuyers.CustomerID
WHERE CBuyers.CustomerID is null
March 29, 2012 at 12:53 pm
Nice topic Jeff. Thanks for the article. I have seen this type of query written using the having + sums method, left join and exists/not exists methods but have never tested which one is best. I personally like the Exists/Not Exists method just for readability even though it sounds like it is less efficient than the having method unless the extra index is applied. I always forget about intersect and except as it seems like those methods are about the same as exists/not exists.
March 29, 2012 at 1:06 pm
I've been faced with this problem many times and never adopted a standard methodology for dealing it with but am quite sure that my attemps were RBAR.
Would this approach be considered set based?
(please don't yell at me if my question was covered in an previous post. I didn't read all of them.)
March 29, 2012 at 1:14 pm
jshahan (3/29/2012)
I've been faced with this problem many times and never adopted a standard methodology for dealing it with but am quite sure that my attemps were RBAR.Would this approach be considered set based?
(please don't yell at me if my question was covered in an previous post. I didn't read all of them.)
Always confirm your standard approach, I have found on occasions that one approach may perform great and in another case a variant query works better. So this is one more tool for my options to consider.
March 29, 2012 at 1:25 pm
Thanks, Jeff.
I thought my standard pattern of simple joins would do about as well:
SELECT DISTINCT p.CustomerID
FROM
#Purchase p
JOIN #Purchase p2 ON
p.CustomerID = p2.CustomerID AND
p2.ProductCode = 'B'
LEFT JOIN #Purchase p3 ON
p.CustomerID = p3.CustomerID AND
p3.ProductCode = 'C'
WHERE
p.ProductCode = 'A' AND
p3.CustomerID IS NULL
Indeed, at 1 million rows performance is similar:
But at 10 million rows there is a big difference:
Thanks as always for sharing your knowledge.
March 29, 2012 at 1:44 pm
How about this?
; WITH CTE AS
(
SELECT P.CustomerID
, Indicator1 = CASE WHEN P.ProductCode IN('A') THEN 0
WHEN P.ProductCode IN('C') THEN -1
END
, Indicator2 = CASE WHEN P.ProductCode IN('B') THEN 1
ELSE 0
END
FROM #Purchase P
WHERE P.ProductCode IN ('A','B','C')
)
SELECT C.CustomerID
FROM CTE C
GROUP BY C.CustomerID
HAVING SUM (C.Indicator1) = 0 AND SUM(C.Indicator2) > 0
On a million row table, i get this:
SQL Server Execution Times:
CPU time = 93 ms, elapsed time = 294 ms.
On ten million row, i get this:
SQL Server Execution Times:
CPU time = 888 ms, elapsed time = 526 ms.
{Edit 1 : added ten million statistics}
{Edit 2 - Removing P.ProductCode, which by the way is not required,from CTE cuts down extra scan}
{Edit 3: Fixed the bug noted by Toby Harman}
March 29, 2012 at 2:34 pm
{message deleted}
March 29, 2012 at 4:00 pm
Hate to point this out, but ColdCoffee's approach is returning inaccurate results.
Specifically, it returns customers 6 and 7 from the original test data.
March 29, 2012 at 4:54 pm
Toby Harman (3/29/2012)
Hate to point this out, but ColdCoffee's approach is returning inaccurate results.Specifically, it returns customers 6 and 7 from the original test data.
That proves why my code was fast 😀 Thanks Toby.
Edited the code; which runs a touch slower than Jeff's
March 29, 2012 at 5:15 pm
mark hutchinson (3/28/2012)
@JeffDid you compare the speed of this query against one where your first (Group By) From clause is a Select Distinct (sub) query?
Nice article. It builds nicely for the reader.
I love seeing the Except clause being introduced to the reader who, like me, cut our teeth on SQL92 and need to learn newer language features to be more productive.
Thanks for the feedback, Mark.
To answer your question, I guess I'd need to see the code example you're talking about because I'm not sure that a SELECT DISTINCT would actually work here.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 31 through 45 (of 166 total)
You must be logged in to reply to this topic. Login to reply