Heh... Ok... first, I have to say that I'm amazed. This particular article got twice as many reads in 1/4 the time as the "Generating test data" article did just this previous Monday.
I'll also say thank you all very much for participating in the discussion and for taking the time to post code and/or provide feedback. What's really cool is that I come up with a simple idea and good folks like yourselves try to make it better and usually succeed. Several of you posted code that runs significantly faster than mine with Arjun S coming up with the fastest (at least on my ol' war horse of a desktop box).
Here's how I tested all of the solutions what worked according to a separate test against the original data provided in the article. I 'threw away" the results in a variable to take the display out of the picture without throwing away the print statements or rowcounts.
PRINT '========== Code From Article ==========================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT CustomerID
FROM #Purchase
WHERE ProductCode IN ('A','B')
GROUP BY CustomerID
HAVING COUNT(DISTINCT ProductCode) = 2
EXCEPT
--===== Find Customers that bought "C".
SELECT CustomerID
FROM #Purchase
WHERE ProductCode IN ('C')
) d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== Arjun S ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT CustomerID
FROM #Purchase
WHERE ProductCode IN ('A')
INTERSECT
SELECT CustomerID
FROM #Purchase
WHERE ProductCode IN ('B')
EXCEPT
--===== Find Customers that bought "C".
SELECT CustomerID
FROM #Purchase
WHERE ProductCode IN ('C')
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== tommyh ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT
CustomerId
FROM #Purchase
WHERE ProductCode IN ('A','B', 'C')
GROUP BY CustomerID
having sum(case when ProductCode = 'A' then 1 else 0 end) > 0
and sum(case when ProductCode = 'B' then 1 else 0 end) > 0
and sum(case when ProductCode = 'C' then 1 else 0 end) = 0
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== Toby Harman ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT a.CustomerID
FROM #Purchase a
LEFT OUTER JOIN #Purchase b
ON b.CustomerID = a.CustomerID
AND b.ProductCode IN ('C')
WHERE a.ProductCode IN ('A','B')
AND b.CustomerID IS NULL
GROUP BY a.CustomerID
HAVING COUNT(DISTINCT a.ProductCode) = 2
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== James Dingle ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SelectDistinct
CustomerID
From#Purchase
WhereProductCode = 'A'
AndCustomerID In
(
SelectCustomerID
From#Purchase
WhereProductCode = 'B'
)
AndCustomerID Not In
(
SelectCustomerID
From#Purchase
WhereProductCode = 'C'
)
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== chintan.j.gandhi ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT DISTINCT CustomerID
FROM #Purchase P
WHERE EXISTS (SELECT 1 FROM #Purchase P1
WHERE P.customerid = P1.customerid
AND P1.productcode = 'A'
)
AND EXISTS (SELECT 1 FROM #Purchase P2
WHERE P.customerid = P2.customerid
AND P2.productcode = 'B'
)
AND NOT EXISTS (SELECT 1 FROM #Purchase P3
WHERE P.customerid = P3.customerid
AND P3.productcode = 'C'
)
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== Dalibor Margotic CTE (First method already posted by Arjun S) ====='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT;
WITH CTE AS
(
SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'A'
GROUP BY CustomerID
)
,CTE2 AS
(
SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'B'
GROUP BY CustomerID
)
,CTE3 AS
(
SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'C'
GROUP BY CustomerID
)
select @Bitbucket = CTE.CustomerID from CTE
left join CTE3 ON
CTE.CustomerID = CTE3.CustomerID
where CTE.CustomerID in (SELECT CustomerID FROM CTE2)
AND CTE3.CustomerID IS NULL
GROUP BY CTE.CustomerID
;
SET STATISTICS TIME OFF;
GO
PRINT '========== MAGOO ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT DISTINCT
CustomerID
FROM #Purchase p1
WHERE ProductCode = 'A'
AND EXISTS (SELECT
1
FROM #Purchase p2
WHERE p2.CustomerID = p1.CustomerID
AND p2.ProductCode = 'B')
AND NOT EXISTS (SELECT
1
FROM #Purchase p3
WHERE p3.CustomerID = p1.CustomerID
AND p3.ProductCode = 'C')
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== Venoym ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT
b.CustomerID
FROM
(
SELECT
a.CustomerID
FROM
(
SELECT
DISTINCT
CustomerID,
ProductCode
FROM
#Purchase
WHERE
ProductCode IN ('A','B')) AS a
GROUP BY
a.CustomerID
HAVING
COUNT(ProductCode) = 2
) AS b LEFT OUTER JOIN
#Purchase p ON b.CustomerID = p.CustomerID AND p.ProductCode = 'C'
WHERE
p.ProductCode IS NULL
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== Jonathan AC Roberts ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT DISTINCT CustomerID
FROM #Purchase p1
WHERE ProductCode = 'A'
AND EXISTS(SELECT 1
FROM #Purchase p2
WHERE p2.CustomerID = p1.CustomerID
AND p2.ProductCode = 'B'
AND NOT EXISTS (SELECT 1
FROM #Purchase p3
WHERE p3.CustomerID = p2.CustomerID
AND p3.ProductCode = 'C'))
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== bob.probst ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
select distinct a.customerid
from #purchase a
inner join #purchase b on (a.customerid = b.customerid)
left join #purchase c on (a.customerid = c.customerid and c.productcode = 'C')
where a.productcode = 'A'
and b.productcode = 'B'
and c.purchaseid is null
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== Vyengr ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
Select Distinct CustomerID
FROM #Purchase PU
WHERE ProductCode ='A'
AND EXISTS (Select CustomerID
FROM #Purchase
WHERE ProductCode ='B' and CustomerID = PU.CustomerID)
AND NOT EXISTS (Select PU.CustomerID
FROM #Purchase
WHERE ProductCode ='C' and CustomerID = PU.CustomerID)
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== ejoell 66477 ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
Select Distinct ABuyers.CustomerID from
(Select CustomerID FROM #Purchase WHERE ProductCode = 'A') ABuyers
JOIN
(Select CustomerID FROM #Purchase WHERE ProductCode = 'B') BBuyers
ON ABuyers.CustomerID = BBuyers.CustomerID
LEFT JOIN
(Select CustomerID FROM #Purchase WHERE ProductCode = 'C') CBuyers
ON ABuyers.CustomerID = CBuyers.CustomerID
WHERE CBuyers.CustomerID is null
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== David Rueter ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT
SELECT @Bitbucket = CustomerID FROM
(
SELECT DISTINCT p.CustomerID
FROM
#Purchase p
JOIN #Purchase p2 ON
p.CustomerID = p2.CustomerID AND
p2.ProductCode = 'B'
LEFT JOIN #Purchase p3 ON
p.CustomerID = p3.CustomerID AND
p3.ProductCode = 'C'
WHERE
p.ProductCode = 'A' AND
p3.CustomerID IS NULL
)d ;
SET STATISTICS TIME OFF;
GO
PRINT '========== ColdCoffee ====================================================='
SET STATISTICS TIME ON;
DECLARE @Bitbucket INT;
WITH CTE AS
(
SELECT P.CustomerID
, Indicator1 = CASE WHEN P.ProductCode IN('A') THEN 0
WHEN P.ProductCode IN('C') THEN -1
END
, Indicator2 = CASE WHEN P.ProductCode IN('B') THEN 1
ELSE 0
END
FROM #Purchase P
WHERE P.ProductCode IN ('A','B','C')
)
SELECT @Bitbucket = C.CustomerID
FROM CTE C
GROUP BY C.CustomerID
HAVING SUM (C.Indicator1) = 0 AND SUM(C.Indicator2) > 0
;
SET STATISTICS TIME OFF;
GO
Against the million row test data generator that I provided in the "Resources" section of the article near the end of the article, here are the result from the code above. Arjun is the clear winner here.
[font="Courier New"]========== Code From Article ===========================================
SQL Server Execution Times:
CPU time = 265 ms, elapsed time = 329 ms.
========== Arjun S =====================================================
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 100 ms.
========== tommyh =====================================================
SQL Server Execution Times:
CPU time = 329 ms, elapsed time = 331 ms.
========== Toby Harman =====================================================
SQL Server Execution Times:
CPU time = 328 ms, elapsed time = 327 ms.
========== James Dingle =====================================================
SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 298 ms.
========== chintan.j.gandhi =====================================================
SQL Server Execution Times:
CPU time = 719 ms, elapsed time = 726 ms.
========== Dalibor Margotic CTE (First method already posted by Arjun S) =====
SQL Server Execution Times:
CPU time = 109 ms, elapsed time = 95 ms.
========== MAGOO =====================================================
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 92 ms.
========== Venoym =====================================================
SQL Server Execution Times:
CPU time = 359 ms, elapsed time = 362 ms.
========== Jonathan AC Roberts =====================================================
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 89 ms.
========== bob.probst =====================================================
SQL Server Execution Times:
CPU time = 266 ms, elapsed time = 270 ms.
========== Vyengr =====================================================
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 92 ms.
========== ejoell 66477 =====================================================
SQL Server Execution Times:
CPU time = 266 ms, elapsed time = 266 ms.
========== David Rueter =====================================================
SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 275 ms.
========== ColdCoffee =====================================================
SQL Server Execution Times:
CPU time = 360 ms, elapsed time = 371 ms.
Warning: Null value is eliminated by an aggregate or other SET operation.[/font]
I also want to say that the SQL IQ shown by folks in this discussion is quite high (well, except for mine... I lost pretty badly considering that I wrote the article :blush:). No one resorted to RBAR of any form and the difference in most of the times are less than 250 ms in difference against a million row problem. You should all take a bow.
--Jeff Moden
Change is inevitable... Change for the better is not.