Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

Question

Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

Viewing 15 posts - 91 through 105 (of 166 total)

You must be logged in to reply to this topic. Login to reply

Gullimeel Hall of Fame Points: 3871 More actions · Answer 1

select * from

(

select id,max(case when cd = 'A' then 1 else 0 end) + max(case when cd = 'B' then 2 else 0 end) + max(case when cd = 'C' then 4 else 0 end) status from a

--where cd in ('A','B','C')

group by id

) dta

where

1=1

--and status = 3 -- a,b but not c

--and status = 5--does have a,c but not b

--and status = 6--does have b,c but not a

--and status = 7 --all 3

--and status = 0 --none of them

--and status = 1 -- just A

--and status = 2 -- just b

and status = 4 -- just C

Change the id with CustomerId ,cd with ProductCode. You are done.You can use anaytical function using parttion instead of group if you need all the columns from the table. You can use the same concept and could get the results for any combination of product codes.

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 2

Gullimeel (5/11/2012)
select * from
(
select id,max(case when cd = 'A' then 1 else 0 end) + max(case when cd = 'B' then 2 else 0 end) + max(case when cd = 'C' then 4 else 0 end) status from a
--where cd in ('A','B','C')
group by id
) dta
where
1=1
--and status = 3 -- a,b but not c
--and status = 5--does have a,c but not b
--and status = 6--does have b,c but not a
--and status = 7 --all 3
--and status = 0 --none of them
--and status = 1 -- just A
--and status = 2 -- just b
and status = 4 -- just C
Change the id with CustomerId ,cd with ProductCode. You are done.You can use anaytical function using parttion instead of group if you need all the columns from the table. You can use the same concept and could get the results for any combination of product codes.

Ummm... that's real nice but what builds the status column? Some magical analytical function that you haven't shown us?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gullimeel Hall of Fame Points: 3871 More actions · Answer 3

Hi Jeff

It is just sum of 2 power (0 for A, 1 for B ,2 for C,3 for D ..and so on)..

Thus if you have all 3 codes then

2* 0 + 2* 1 + 2*2 = 1 + 2 + 4. = 7

if any of the code is missing the value for that will be 0. That way you know which code is there and which one are missing... Consider these as bits and bits are set to 1for the codes which are there for a given customer id and for others these are set to 0.

Hopefully, this will help..

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]

Gullimeel Hall of Fame Points: 3871 More actions · Answer 4

Below is the sql to show how much cpu time and logical IO's it takes to find the customers who bought A,B but not C.

Also, I have done a performance comparison when you want to find the details from Purchase table

for the customer who bought A, B but not C.

The method using parttion by which eliminates join with purchase column is way too expensive...

Thus before using any new method benchmark it against existing best solution...

Just to make sure which method is better..Follow one rule..

Benchmark benchmark Benchmark.....

use tempdb

go

/*

--uncomment this to create purchase table.. This script to generate the 1 million rows is

--created by Jeff Moden on sql server central . com

--===== This code takes about 23 seconds to run on a 10 year old,

-- single CPU desktop machine.

--===== Conditionally drop the test table to make reruns in SSMS easier.

-- This is NOT a part of the solution.

IF OBJECT_ID('tempdb..Purchase','U') IS NOT NULL

DROP TABLE Purchase

;

--===== Create and populate the test table.

-- This is NOT a part of the solution.

SELECT TOP (1000000)

PurchaseID = IDENTITY(INT,1,1),

CustomerID = ABS(CHECKSUM(NEWID())) % 50000 + 1,

ProductCode = CHAR(ABS(CHECKSUM(NEWID())) % 26 + 65)

INTO Purchase

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

--===== Add the expected PK

ALTER TABLE Purchase

ADD PRIMARY KEY CLUSTERED (PurchaseID)

;

--===== Can you guess which index will be used?

CREATE INDEX IX_Purchase_CustomerID_ProductCode

ON Purchase (CustomerID, ProductCode)

;

CREATE INDEX IX_Purchase_ProductCode_CustomerID

ON Purchase (ProductCode, CustomerID)

update statistics Purchase with fullscan

*/

--- find only the customer ids who bought a,b but not c

with cte1

as

(

select

CustomerId

,MAX(case when ProductCode = 'A' then 1 else 0 end) +

max(case when ProductCode = 'B' then 2 else 0 end) +

max(case when ProductCode = 'C' then 4 else 0 end ) as status

from Purchase

group by CustomerID

)

select * from cte1 where status = 3

--Above uses a non clustered index scan and clocked 1867 logical IO and around 800 ms cpu and elapsed time.

--query to get the details of customers along with product code and using join back to the original table..

with cte1

as

(

select

CustomerId

,MAX(case when ProductCode = 'A' then 1 else 0 end) +

max(case when ProductCode = 'B' then 2 else 0 end) +

max(case when ProductCode = 'C' then 4 else 0 end ) as status

from Purchase

group by CustomerID

)

select p.CustomerId,p.ProductCode,cte1.status from cte1 inner join purchase p on cte1.CustomerId = p.CustomerId

where status = 3

and p.ProductCode not in ('C'); --as we

--It clocked 3734 Io's and 1200 ms CPU time and 1500 ms of elapsed time for 137 K rows..This is quite fast

--This is using without join but using parttion by ..This is performaing badly..

--Thus do not try the new methods just to make the statement simple..

--Always benchmark with the best method you have with any new method which is introduced in the new versions..

--It is clocking 2.2 million IO's approx...Which is way too high 7533 ms CPU time... and 2853 ms elapsed time for 137 k rows

with cte1

as

(

select

CustomerID,ProductCode

,

case when ProductCode = 'A' then 1 else 0 end as IsAExist

, case when ProductCode = 'B' then 2 else 0 end as IsBExist

, case when ProductCode = 'C' then 4 else 0 end as IsCExist

from Purchase

),cte2

as

(

select CustomerID,ProductCode, MAX(IsAExist) over(PARTITION by CustomerID) + MAX(IsBExist)over(PARTITION by CustomerID) + MAX(IsCExist)over(PARTITION by CustomerID) as status from cte1

)

select * from cte2

where status = 3

--option(maxdop 1)

; --a and b but not C

go

--

/*

To compare the above...

without using parttition

Logical IOCPU TimeElapsed Time

3734800ms800ms

22018717500ms2850ms

*/

GulliMeel

Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]

WayneS SSC Guru Points: 95460 More actions · Answer 5

Great article Jeff.

Awesome discussion by all! Thanks!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

The Wizard Of Oz SSC Eights! Points: 875 More actions · Answer 6

Another way to skin the cat? (results are for the million-row table provided in the article)

(apologies if someone already posted this method)

--SET STATISTICS IO ON

--SET STATISTICS TIME ON

SELECT CustomerID

FROM #Purchase

WHERE ProductCode IN('A','B','C') -- only A,B,C, AB,AC,BC and ABC

GROUP BY CustomerID

HAVING COUNT(DISTINCT ProductCode) = 2 -- only AB,AC and BC

AND MAX(ProductCode) = 'B' -- only AB! :)

/*

(6763 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#Purchase'. Scan count 3, logical reads 226, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 187 ms, elapsed time = 187 ms.

*/

Alexander-449406 SSC Veteran Points: 213 More actions · Answer 7

I have generated test data with 1 000 000 000 000 000 000 000 rows and according to my tests all the queries proposed run within 1 second. You guys can choose the query that has fewer characters!

SQL Simmo SSC Enthusiast Points: 132 More actions · Answer 8

Yet another way to skin the cat, using a PIVOT this time, not as quick as the others but returns the counts of each product, sometimes you want to know this too 🙂

with results(CustomerID, A, B, C) as

(

SELECT CustomerID, A, B, C FROM

(SELECT CustomerID, ProductCode FROM #Purchase WHERE ProductCode IN ('A','B','C')) AS t1

PIVOT (COUNT(ProductCode) FOR ProductCode IN (A,B,C) ) AS t2

)

SELECT *

from results

where A > 0 and B > 0 and C = 0

Original query: CPU time = 94 ms, elapsed time = 183 ms.

This Query: CPU time = 109 ms, elapsed time = 221 ms.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 9

You know me Jeff 🙂 Can't help myself since this is a Relational Division Problem.

SELECTCustomerID

FROM#Purchase

WHEREProductCode IN ('A', 'B', 'C')

GROUP BYCustomerID

HAVINGMIN(ProductCode) = 'A'

AND MAX(ProductCode) = 'B'

--AND COUNT(*) = 2;

N 56°04'39.16"
E 12°55'05.25"

ben.norris Valued Member Points: 58 More actions · Answer 10

SELECTCustomerID
FROM#Purchase
WHEREProductCode IN ('A', 'B', 'C')
GROUP BYCustomerID
HAVINGMIN(ProductCode) = 'A'
AND MAX(ProductCode) = 'B'
--AND COUNT(*) = 2;

This relies on A, B and C being ordered rather than some arbitary values that happen to be A, B and C in this instance so it is kind of a cheat surely?

The Wizard Of Oz SSC Eights! Points: 875 More actions · Answer 11

There's no dependance on ordering, so no cheating going on. It's a natural part of MIN() and MAX() functions.

It would be strange if someone caught SwePeso pulling a fast one, but he's "quite good with his SQL" to put it lightly 😀

ben.norris Valued Member Points: 58 More actions · Answer 12

The Wizard Of Oz (6/6/2014)
There's no dependance on ordering, so no cheating going on. It's a natural part of MIN() and MAX() functions.
It would be strange if someone caught SwePeso pulling a fast one, but he's "quite good with his SQL" to put it lightly 😀

Are you sure? It looks to me that if C was between B and A then it wouldn't exclude customers who bought C

valeryk2000 SSCarpal Tunnel Points: 4245 More actions · Answer 13

I have two tables with patient ID - one contains several records for one ID (tblLetterFlag), in the second it is a primary key. This code works OK:

select ClinicalReviewID from tblLetterFlag

except

select ClinicalReviewID from tblClinicalReview

order by 1

This one triggers an error:

select ClinicalReviewID from tblLetterFlag

intersect

select ClinicalReviewID from tblClinicalReview

order by 1

[font="Courier New"]Msg 0, Level 11, State 0, Line 0

A severe error occurred on the current command. The results, if any, should be discarded.

Msg 0, Level 20, State 0, Line 0

A severe error occurred on the current command. The results, if any, should be discarded.[/font]

The second one works smoothely if I get rid of order by. Is it my problem or that of 'intersect'?

Thanks

The Wizard Of Oz SSC Eights! Points: 875 More actions · Answer 14

ben.norris (6/6/2014)
The Wizard Of Oz (6/6/2014)
There's no dependance on ordering, so no cheating going on. It's a natural part of MIN() and MAX() functions.
It would be strange if someone caught SwePeso pulling a fast one, but he's "quite good with his SQL" to put it lightly 😀
Are you sure? It looks to me that if C was between B and A then it wouldn't exclude customers who bought C

If we are ever unsure about how some code will behave, we can always test out assumptions on an actual server:

Below I have 3 customers with A, B and C "ordered" differently, but the MIN() and MAX() functions pull out the correct results irrespective of the orderings.

WITH TestTable

AS (

SELECT 1 AS CustomerID, ProductID

FROM (

VALUES ('A'), ('C'), ('B')

) x(ProductID)

UNION ALL

SELECT 2 AS CustomerID, ProductID

FROM (

VALUES ('A'), ('B'), ('C')

) x(ProductID)

UNION ALL

SELECT 3 AS CustomerID, ProductID

FROM (

VALUES ('C'), ('A'), ('B')

) x(ProductID)

)

SELECT CustomerID, MIN(ProductID) AS MinProductID, MAX(ProductID) AS MaxProductID

FROM TestTable

GROUP BY CustomerID

craig 81366 SSC Eights! Points: 808 More actions · Answer 15

valeryk2000 (6/6/2014)
I have two tables with patient ID - one contains several records for one ID (tblLetterFlag), in the second it is a primary key. This code works OK:
select ClinicalReviewID from tblLetterFlag
except
select ClinicalReviewID from tblClinicalReview
order by 1
This one triggers an error:
select ClinicalReviewID from tblLetterFlag
intersect
select ClinicalReviewID from tblClinicalReview
order by 1
[font="Courier New"]Msg 0, Level 11, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.
Msg 0, Level 20, State 0, Line 0
A severe error occurred on the current command. The results, if any, should be discarded.[/font]
The second one works smoothely if I get rid of order by. Is it my problem or that of 'intersect'?
Thanks

It seems ORDER BY is causing some confusion with INTERSECT.

As a workaround, you should be able to write:

;with CTE AS (

select ClinicalReviewID from tblLetterFlag

intersect

select ClinicalReviewID from tblClinicalReview

)

SELECT ClinicalReviewID

FROM CTE

order by 1