How to match sets of records

  • I have an app where users can define their own groups of selections which are made of a number of individual items.

    I have been asked to develop a facility where a user can 'share' their list with another user. Share = make a copy rather than create a view.

    The problem is that there is a requirement not to share the group if the recipient already has group which matches the one being shared (no more no less).

    The simplified table is:

    group_id user_id item

    1 user1 item1

    1 user1 item2

    2 user2 item1

    2 user2 item2

    2 user2 item3

    3 user3 item1

    3 user3 item3

    4 user4 item1

    4 user4 item2

    In this example, user1 could share their group with user2 and user3 as they do not have a group which exactly matches user1's group (group_id = 1) which is being shared. User1 could not share the group with user4 as user4 already has a group with exactly the same content as the group being shared.

    All ideas and suggestions welcome.

    Jeremy

  • Sounds like this will get completely cumbersome to manage. What about storing groups with their items and then "linking" to users? Maybe even have an "owner" flag in the link table.

    Group

    GroupID ItemID

    ------- ------

    1 A

    1 B

    2 A

    2 C

    3 D

    User Groups

    GroupID UserID Owner

    ------- ------ --------

    1 Jim X

    1 Sally

    2 Sally X

    3 Jim X

    Jim has "shared his group 1 with Sally.

    Steve Jones

    sjones@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/sjones

    http://www.dkranch.net

  • Steve,

    I agree that it is cumbersome. When I first started looking at this I thought it would be easy and I could use something like EXISTS to check for an existing group. However, sub queries can only return a single row and I have struggled to work out the syntax.

    I cannot link the groups to users as one of the requirements is that the recipient then owns the group so that they can then amend the content as they see fit independently of the original owner.

    At the moment, the only way I can see of doing this is:

    1. Count the number of items in the group to be shared

    2. For each group of the recipient:

    - count the number of items

    - if the number of items match, check the content. I think I can do this by counting the number of rows in the recipient's group where the item is in (select items from original group). If the number matches the number of items in the recipient's group then it is a match.

    It's not pretty but it should work. I was hoping that someone might come up with some ideas which would result in a single SQL step. Maybe it is just not possible.

    Jeremy

  • I am not sure I get the complete picture, but I'll give it a shot anyway.

    Isn't it possible to use an self full outer join (or is it full outer self join)?

    If rows are returned that have a null at either side, the groups do not match. If no rows are returned with NULL values, you have a perfect.

    You'd get something like

    
    
    SELECT COUNT(*) FROM
    (SELECT item FROM table
    WHERE group_id = <group_id to share>
    AND user_id = <user that shares>) T1
    FULL OUTER JOIN
    (SELECT item FROM table
    WHERE user_id = <user that gets share>) T2
    ON T1.item = T2.item
    WHERE T1.item IS NULL OR T2.item IS NULL
  • Agree with NPeeters suggestion!

  • Thanks for the SQL - it took me a little while to work out what was going on and that a zero meant a match and a non zero indicated not a match.

    I have been trying it out on a number of scenarioes and it doesn't seem to work in more complex situations.

    set nocount on

    use tempdb

    create table #group

    (groupid int, userid varchar(8), item varchar(2))

    insert #group select 1, 'id1', 'a'

    insert #group select 1, 'id1', 'b'

    insert #group select 2, 'id2', 'a'

    insert #group select 2, 'id2', 'b'

    insert #group select 2, 'id2', 'c'

    insert #group select 3, 'id2', 'a'

    insert #group select 3, 'id2', 'b'

    SELECT count(*) FROM

    (SELECT item FROM #group

    WHERE groupid = 1

    AND userid = 'id1') T1

    FULL OUTER JOIN (SELECT item FROM #group

    WHERE userid = 'id2') T2

    ON T1.item = T2.item

    WHERE T1.item IS NULL OR T2.item IS null

    go

    drop table #group

    In this scenario, user id1 wants to share group #1 with user id2. User id2 has a group (#3) which matches the content of group #1 but the return is 1 not 0.

    I acutally need the group id number rather than a count as I have some other processing later which will need the id. In this scenario I would need a return value of 3 (the id of the group that matches). The recipient might have dozens of groups which need to be checked - in theory they can set up mutliple groups which have the same content and I only need one id value.

    I'm still working on it but if you have any more ideas then please let me know.

    Thanks

    Jeremy

  • OK. Changed it a bit.

    First use the previous query, but just get all 'rows' where a null is present and group by groupid of the person that receives the share.

    Now you have a list of all groupid's that DO NOT comply to the rules you have stated. So then you just get a list of all groups that are attribuated to the user that receives the share and eliminate all those that DO NOT comply with the NOT EXISTS clause.

    A lot of text, to explain the 'simple' query below. Enjoy...

    
    
    SELECT T3.groupid FROM
    (SELECT groupid FROM #GROUP
    WHERE userid='id2'
    GROUP BY groupid) T3
    WHERE NOT EXISTS
    (SELECT groupid FROM
    (SELECT item FROM #group
    WHERE groupid = 1
    AND userid = 'id1') T1
    FULL OUTER JOIN
    (SELECT item FROM #group
    WHERE userid = 'id2') T2
    ON T1.item = T2.item
    WHERE T1.item IS NULL OR T2.item IS null
    GROUP BY T2.groupid) T4
    WHERE t3.groupid = T4.groupid)

    Edited by - NPeeters on 06/05/2003 04:20:27 AM

  • Thanks but I am really struggling to understand the SQL. There are unmatched brackets and I cannot work out what is wrong.

    I think I understand each small bit of code but putting all together has confused me.

    Can you sort me out?

    Thanks

    Jeremy

  • Here goes nothing ...

    We want to retrieve all groupid's from the 'receiving' user that comply with the requiremens.

    
    
    SELECT T3.groupid FROM

    The main FROM clause... A subquery to get a recordset with all groupid's for this user.

    
    
    (SELECT groupid FROM #GROUP
    WHERE userid='id2'
    GROUP BY groupid) T3

    The main WHERE clause. The only groupid's we want are those that are not in the complicated select below... so NOT EXISTS

    
    
    WHERE NOT EXISTS

    This is the original query... only the select clause has changed.

    We now retrieve the list of groupid's for which a record with NULL exists. This yields that every groupid that is returned does not comply with the requirements.

         
    
    (SELECT groupid FROM
    (SELECT item FROM #group
    WHERE groupid = 1
    AND userid = 'id1') T1
    FULL OUTER JOIN
    (SELECT item FROM #group
    WHERE userid = 'id2') T2
    ON T1.item = T2.item
    WHERE T1.item IS NULL OR T2.item IS null

    We group by groupid of the 'receiving' person. Just to get a single record for each groupid in the resultset.

    
    
    GROUP BY T2.groupid) T4

    This is the 'join' syntax for the NOT EXISTS clause. Meaning that we have to match the groupid's.

    
    
    WHERE t3.groupid = T4.groupid)
  • And here's the complete query again, with matching brackets. Sorry for that...

    
    
    SELECT T3.groupid FROM
    (SELECT groupid FROM #GROUP
    WHERE userid='id2'
    GROUP BY groupid) T3
    WHERE NOT EXISTS
    (SELECT groupid FROM
    ( --Added this one...
    (SELECT item FROM #group
    WHERE groupid = 1
    AND userid = 'id1') T1
    FULL OUTER JOIN
    (SELECT item FROM #group
    WHERE userid = 'id2') T2
    ON T1.item = T2.item
    WHERE T1.item IS NULL OR T2.item IS null
    GROUP BY T2.groupid) T4
    WHERE t3.groupid = T4.groupid)
  • Thanks for spending the time to explain the SQL - I really apprecicate it. However, I am still getting a syntax error.

    This is what I have in QA:

    
    

    set nocount on
    use tempdb

    create table #group
    (groupid int, userid varchar(8), item varchar(2))

    insert #group select 1, 'id1', 'a'
    insert #group select 1, 'id1', 'b'
    insert #group select 2, 'id2', 'a'
    insert #group select 2, 'id2', 'b'
    insert #group select 2, 'id2', 'c'
    insert #group select 3, 'id2', 'a'
    insert #group select 3, 'id2', 'b'


    SELECT T3.groupid FROM
    (SELECT groupid FROM #GROUP
    WHERE userid='id2'
    GROUP BY groupid) T3
    WHERE NOT EXISTS
    (SELECT groupid FROM
    (
    (SELECT item FROM #group
    WHERE groupid = 1
    AND userid = 'id1') T1
    FULL OUTER JOIN
    (SELECT item FROM #group
    WHERE userid = 'id2') T2
    ON T1.item = T2.item
    WHERE T1.item IS NULL OR T2.item IS null
    GROUP BY T2.groupid) T4
    WHERE t3.groupid = T4.groupid)



    drop table #group

    This is the error:

    Server: Msg 156, Level 15, State 1, Line 30

    Incorrect syntax near the keyword 'WHERE'.

    Can you try it at your end? I've tried putting in the missing bracket just about everywhere and I still get a syntax error. It might be something with my copy of QA but I would appreciate it if you could check that this code works at your end.

    Thanks

    Jeremy

  • I've got this working:

    
    
    SELECT groupid FROM
    (SELECT groupid FROM #GROUP
    WHERE userid='id2'
    GROUP BY groupid) T3

    WHERE groupid not in
    (SELECT t2.groupid FROM
    (SELECT groupid, item FROM #group
    WHERE groupid = 1
    AND userid = 'id1') T1
    FULL OUTER JOIN
    (SELECT groupid, item FROM #group
    WHERE userid = 'id2') T2
    ON T1.item = T2.item
    WHERE T1.item IS NULL OR T2.item IS null
    GROUP BY T2.groupid)

    which is very similar to your query - the nested subquery needs to select both groupid and item and I changed the NOT EXISTS to GROUPID NOT IN.

    It seems to give me the right answer but I need to do more testing.

    If you can see something obviously wrong then please let me know.

    Thanks very much for your time and effort. I have learnt a huge amount from this.

    Jeremy

    Edited by - Jeremy Kemp on 06/05/2003 06:28:08 AM

  • Jeremy, your query is correct, I think. If I'm not mistaking, using NOT IN should be avoided because of performance issues. I'm sure you can find some reference in older threads.

    For completeness I've redone my query with the NOT EXISTS. This time I took the time to test it...

    
    
    SELECT T3.groupid
    FROM
    (SELECT groupid FROM #GROUP
    WHERE userid='id2'
    GROUP BY groupid) T3
    WHERE NOT EXISTS
    (SELECT T4.groupid FROM
    --I went wrong here ...
    (SELECT T2.groupid
    FROM
    (SELECT item FROM #group
    WHERE groupid = 1
    AND userid = 'id1') T1
    FULL OUTER JOIN
    (SELECT item FROM #group
    WHERE userid = 'id2') T2
    ON T1.item = T2.item
    WHERE T1.item IS NULL OR T2.item IS null
    GROUP BY T2.groupid) T4
    WHERE t3.groupid = T4.groupid)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply