Finding the Baseball Players
Or how to find out which customer only bought one type of product.
Introduction
I recently saw a question on one of our discussion lists that asked for a query that
would solve a query problem. I thought it was an interesting query and one that others
might benefit from, so I decided to jot some notes and describe the problem and my solution.
The Problem
Suppose you run a sporting goods store and you just got some new product and want to
market it to the pitchers who come into your store. You have lots of data on products
that you have sold, including the type of sport for which the sale is intended. You decide
(perhaps erroneously) that the true baseball players who would buy this product, only buy
other baseball products.
How can you find out which customers have only purchased baseball products?
Here is the schema of your database:
SteveProd (Holds sale information):
ProdID ProdName ProdTyp ------- -------------------- -------------- 1 Bat Baseball 2 Baseball Baseball 3 Football Football 4 Tee Baseball 5 Tee Football 6 Tee Golf 7 Basketball Basketball 8 Mask Hockey
SteveSales (Holds sale information):
CustID ProdID Qty Price ------- ------- ------ -------- 1 1 3 4.00 2 1 1 5.00 2 2 1 99.00 3 4 1 50.00 4 1 2 3.00 4 2 1 4.00 4 3 2 4.00 4 5 3 9.99 5 1 3 4.00 5 3 3 14.00 6 5 3 24.00 7 7 1 54.00
If you run a query to see the sales, you will get the following:
Query:
select s.CustID, s.ProdId, p.prodname, p.ProdTyp from SteveSales s inner join SteveProd p on s.Prodid = p.prodid order by custid
Results:
CustID ProdID ProdName ProdTyp ------- ------- ----------- -------- 1 1 Bat Baseball 2 1 Bat Baseball 2 2 Baseball Baseball 3 4 Tee Baseball 4 1 Bat Baseball 4 2 Baseball Baseball 4 3 Football Football 4 5 Tee Football 5 1 Bat Baseball 5 3 Football Football 6 5 Tee Football 7 7 Basketball Basketball
The Solution
Now, how can we find out who has purchased only baseball products? It turns out that there
are a few different ways. I will show you two and explain why I prefer one over the other.
The basic problem is to find the customers who have purchased baseball products and then
remove those customers who have also purchased other products. To do this, you have two basic
choices that I came up with.
The first solution involves taking the query above and qualifying it with the type of product
that one is searching for. Once you have this qualification, you add an additional qualification using
a correlated subquery to remove customers that have purchased other products. This query looks like:
declare @typ varchar( 20) select @typ = 'Baseball' select distinct s.CustID from SteveSales s inner join SteveProd p on s.Prodid = p.prodid where p.prodtyp = @typ and s.custid not in ( select s1.custid from SteveSales s1 inner join SteveProd p1 on s1.prodid = p1.prodid where s1.custid = s.custid and p1.prodtyp != @typ ) order by custid
This query returns customers 1, 2, and 3. If you check the result above, you will find that
these customers have purchased only baseball products. If you want to check this, you can change
the type to football and you should receive only customer 6.
The problem that I see with this query, is the use of the Not in and != operators. These
operators in general cause performance issues on large result sets because the entire table
that is being checked must be scanned to verify that a row does not fit.
My second solution is to rewrite this query to remove the negative operators. To remove
these operators, I decided to check on which customers had not purchase other items, but checking
the quantity of other types of product purchased. To check this quanity, I created an aggregate
in the subquery and then performed a positive join on this sum with the value zero (0). If this
condition is true, then no other types of products were purchased. To remove the != operator, I
changed this to a simple OR operator. Here is the query:
declare @typ varchar( 20) select @typ = 'Baseball' select distinct s.CustID from SteveSales s inner join SteveProd p on s.Prodid = p.prodid where p.prodtyp = @typ and 0 = ( select count(s1.custid) from SteveSales s1 inner join SteveProd p1 on s1.prodid = p1.prodid where s1.custid = s.custid and (p1.prodtyp < @typ or p1.prodtyp > @typ) ) order by custid
Conclusions
I am sure some of you out there may have better or more creative solutions, but I tried to
keep things simple and did not see a shorter way of performing this query. Of course, temp tables,
derived tables, cursors, etc. could be used, but many of these solutions may cause a performance or
resource hit that I would not want to take.
As always, I welcome feedback (click the Your Opinion) button below and hope you will rate this
article.
Steve Jones
©dkRanch.net August 2001