Many-To-Many Select Query

  • I have to build a very strange query that has been bothering me for a few days now and it's starting to get the best of me.

    I know it might just have a simple solution, but it has eluded me for a long time and I just can't nail a solution that will return the corect subset on all situations. I'll try do decribe the problem as simple as I can:

    Using the standard Client, Order, Products metaphor, let's say I have a table that describes a many-to-many relationship, say ordered Products:

    ClientID | ProductID

    1 | 30

    1 | 35

    2 | 30

    3 | 30

    3 | 31

    etc..

    Then I have this second table that acts like a "filter table"

    ProductID(int) | HasOrdered (bit NULL)

    30 | 1

    31 | 1

    32 | 0

    35 | 0

    40 | NULL

    Based on the data in the second table, I need to get all the clients that have ordered products 30 and 31 but have not ordered products 32 and 35.

    However, I also need to select (in the same query) all of the clients that have ordered product 40 (where HasOrdered is NULL) regardless if they have ordered any of other products or not.

    The rule is : to select all (clients that have ordered ProductIDs where HasOrdered is null) OR ( clients that have ordered ALL products where HasOrdered =1 AND that have not ordered any of the products where HasOrdered = 0 )

    I could never find the corect way of joining all the tables to work in all situations: eg. if there are no rows with HasOrdered=1 or none with HasOrdered =0 or HasOrdered =NULL.

    Some situation will always mess up my joins and it will return a wrong subset of clients.

    Hope I managed to make sense in the post.

    Thanks for anyone dropping a hand to help me !

  • Your data does not make sense. The second table does not indicate a client.

    And then a tip on posting...

    What would help is if you post with some sample data generation.

    [font="Courier New"]CREATE TABLE #ClientProduct (ClientID INT, ProductID INT)

    INSERT #ClientProduct VALUES (1,30)

    INSERT #ClientProduct VALUES (1,35)[/font]

    If your sample data includes a case of the records you want returned (and you tell us which ones they are), it will be really for people to get into writing the query rather than having to build something to create the test data.

  • Ok, good tip on the SQL scripts.

    Let's see :

    CREATE TABLE #ClientProduct (ClientID INT, ProductID INT)

    INSERT #ClientProduct VALUES (1,30)

    INSERT #ClientProduct VALUES (1,35)

    INSERT #ClientProduct VALUES (2,31)

    INSERT #ClientProduct VALUES (3,40)

    INSERT #ClientProduct VALUES (3,41)

    INSERT #ClientProduct VALUES (4,30)

    INSERT #ClientProduct VALUES (4,41)

    CREATE TABLE #ProductFilter (ProductID INT, HasOrdered BIT NULL)

    INSERT #ProductFilter VALUES (30,1) -- all clients that have ordered product 30

    INSERT #ProductFilter VALUES (41,0) -- but which have not ordered product 41

    INSERT #ProductFilter VALUES (42,0) -- and which have not ordered product 42,etc

    INSERT #ProductFilter VALUES (40,NULL) -- all clients that ordered 40, regardless of other criteria

    In this case, my query needs to return clients 1 and 3:

    ID 1 - OK because he ordered product 30 AND he never ordered product 41

    ID 2 - rejected because he never ordered product 30

    ID 3 - OK because he ordered 40 and the rest of the criteria do not matter

    ID 4 - rejected because he ordered product 30, BUT he also ordered product 41

    So in this data case I need all clients that:

    1. Either have product 40 OR

    2. Have product 30 AND don't have either of 41 and 42

    It's possible to have multiple rows in the filter too :

    DELETE FROM #ProductFilter

    INSERT #ProductFilter VALUES (30,1)

    INSERT #ProductFilter VALUES (31,1)

    INSERT #ProductFilter VALUES (32,1)

    INSERT #ProductFilter VALUES (41,0)

    INSERT #ProductFilter VALUES (42,0)

    INSERT #ProductFilter VALUES (43,0)

    INSERT #ProductFilter VALUES (40,NULL)

    INSERT #ProductFilter VALUES (50,NULL)

    INSERT #ProductFilter VALUES (60,NULL)

    means all clients that:

    1. Ordered at least one of 40, 50 or 60 OR (doesn't need to order all 3, just one of them is enough)

    2. Ordered all of 30,31,32 but none of 41,42,43

    Makes more sense now ?:)

  • So the product filter table applies to every client?

  • Well, in reality there are a few more tables involved, but for the sake of this problem YES, it applies to all clients in the #ClientProduct table.

  • Ok, there is probably a more efficient way than this, but keeping it simple and just creating sub-queries (using CTE syntax to make them look nice) for your criteria, this seems to work:

    [font="Courier New"]; WITH MustHaveOrdered (ClientID)

    AS (SELECT ClientID FROM #ProductFilter PF

    INNER JOIN #ClientProduct CP ON CP.ProductID = PF.ProductID WHERE PF.HasOrdered = 1)

    , MustHaveNotOrdered (ClientID)

    AS (SELECT ClientID FROM #ProductFilter PF

    INNER JOIN #ClientProduct CP ON CP.ProductID = PF.ProductID WHERE PF.HasOrdered = 0)

    , IncludeAll (ClientID)

    AS (SELECT ClientID FROM #ProductFilter PF

    INNER JOIN #ClientProduct CP ON CP.ProductID = PF.ProductID WHERE PF.HasOrdered IS NULL)

    SELECT DISTINCT

    CP.ClientID

    FROM

    #ClientProduct CP

    WHERE

    (ClientID IN (SELECT ClientID FROM MustHaveOrdered)

    AND ClientID NOT IN (SELECT ClientID FROM MustHaveNotOrdered))

    OR

    (ClientID IN (SELECT ClientID FROM IncludeAll))[/font]

  • It seems to work indeed. Haven't used CTE before :-), looks nice.

    Only problem is the MustHaveNotOrdered table. In my real-life case it can mean a list of 1.000.000 rows (ok, so they're not clients :P).

    I hate to keep asking for free lunches like that, but can you see a way to join MustHaveOrdered and MustHaveNotOrdered into only 1 CTE?

    Oh, and thanks for helping out !

  • As a CTE, even if there were the same CTE, they would get translated into individual sub-queries. So, combining them just makes the logic complicated as does not help performance.

    Let me do my job for a few minutes then I will change this into some join logic.

  • [font="Courier New"]SELECT

    CP.ClientID

    FROM

    #ClientProduct CP

    LEFT JOIN #ProductFilter MHO ON MHO.ProductID = CP.ProductID AND MHO.HasOrdered = 1

    LEFT JOIN #ProductFilter MNHO ON MNHO.ProductID = CP.ProductID AND MNHO.HasOrdered = 0

    LEFT JOIN #ProductFilter IA ON IA.ProductID = CP.ProductID AND IA.HasOrdered IS NULL

    GROUP BY

    CP.ClientID

    HAVING

    (COUNT(MHO.ProductID) > 0

    AND COUNT(MNHO.ProductID) = 0)

    OR (COUNT(IA.ProductID) > 0)[/font]

  • Looks great ! Let me test this on the real DB for a while. Sounds just like what I needed !

    Forever in debt !;)

    Edit:Actually, it needs a bit of editing. unless I'm mistaken it will return clients that ordered AT LEAST one product => COUNT(MHO.ProductID) > 0

    whereas I need clients that ordered all

    COUNT(MHO.ProductID) = (SELECT count(*) from ProductFilter where HasOrdered =1)

    no?

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply