Compare subset with subset in same table

  • I have a table with 2 columns, FormulaID, IngredientID

    The table contains formulanames with the ingredients.

    I want to find formula pairs within this table with the same ingredients.

    Say the values are

    FormulaID IngredientID

    F1 I1

    F1 I2

    F2 I3

    F2 I4

    F2 I5

    F3 I1

    F3 I2

    F4 I3

    F4 I4

    F4 I5

    F5 I1

    F5 I2

    and so on

    When I run the query, I want

    F1 F3

    F1 F5

    F3 F5

    F2 F4

    and so on returned

    I don't mind if it returns all combinations also, so result could be

    F1 F3

    F1 F5

    F3 F1

    F3 F5

    F5 F1

    F5 F3

    F2 F4

    I am stymied here, tried different things but not even close to the result

  • IF OBJECT_ID('tempdb..#tmp', 'U') IS NOT NULL

    DROP TABLE #tmp;

    CREATE TABLE #tmp

    (

    FormulaId CHAR(2)

    , IngredientId CHAR(2)

    );

    INSERT #tmp

    (FormulaId, IngredientId)

    VALUES

    ('F1' ,'I1'),

    ('F1' ,'I2'),

    ('F2' ,'I3'),

    ('F2' ,'I4'),

    ('F2' ,'I5'),

    ('F3' ,'I1'),

    ('F3' ,'I2'),

    ('F4' ,'I3'),

    ('F4' ,'I4'),

    ('F4' ,'I5'),

    ('F5' ,'I1'),

    ('F5' ,'I2')

    SELECT DISTINCT

    t1.FormulaId

    , t2.FormulaId

    FROM

    #tmp t1

    JOIN #tmp t2 ON t1.IngredientId = t2.IngredientId

    WHERE t2.FormulaId > t1.FormulaId;

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (12/22/2016)


    IF OBJECT_ID('tempdb..#tmp', 'U') IS NOT NULL

    DROP TABLE #tmp;

    CREATE TABLE #tmp

    (

    FormulaId CHAR(2)

    , IngredientId CHAR(2)

    );

    INSERT #tmp

    (FormulaId, IngredientId)

    VALUES

    ('F1' ,'I1'),

    ('F1' ,'I2'),

    ('F2' ,'I3'),

    ('F2' ,'I4'),

    ('F2' ,'I5'),

    ('F3' ,'I1'),

    ('F3' ,'I2'),

    ('F4' ,'I3'),

    ('F4' ,'I4'),

    ('F4' ,'I5'),

    ('F5' ,'I1'),

    ('F5' ,'I2')

    SELECT DISTINCT

    t1.FormulaId

    , t2.FormulaId

    FROM

    #tmp t1

    JOIN #tmp t2 ON t1.IngredientId = t2.IngredientId

    WHERE t2.FormulaId > t1.FormulaId;

    This will give combos where they share ANY ingredient, but I think that he is looking for records where they have ALL of the same ingredients.

    I've added another formula to the sample data that illustrates the problem. F6 should not match any of the other formulas, but it is matching each of the other formulas.

    IF OBJECT_ID('tempdb..#tmp', 'U') IS NOT NULL

    DROP TABLE #tmp;

    CREATE TABLE #tmp

    (

    FormulaId CHAR(2)

    , IngredientId CHAR(2)

    );

    INSERT #tmp

    (FormulaId, IngredientId)

    VALUES

    ('F1' ,'I1'),

    ('F1' ,'I2'),

    ('F2' ,'I3'),

    ('F2' ,'I4'),

    ('F2' ,'I5'),

    ('F3' ,'I1'),

    ('F3' ,'I2'),

    ('F4' ,'I3'),

    ('F4' ,'I4'),

    ('F4' ,'I5'),

    ('F5' ,'I1'),

    ('F5' ,'I2'),

    ('F6' ,'I1'),

    ('F6' ,'I5')

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This approach will work, but it may not be the most efficient. It uses XML concatenation to create a comma-separated list of the ingredients and then checks for two formulae that have the same comma-separated list.

    ;

    WITH formulae AS (

    SELECT DISTINCT t.FormulaId, STUFF((

    SELECT ',', t2.IngredientId

    FROM #tmp t2

    WHERE t2.FormulaId = t.FormulaId

    ORDER BY t2.IngredientId

    FOR XML PATH(''), TYPE

    ).value('.', 'VARCHAR(255)'), 1, 1, '') AS IngredientList

    FROM #tmp t

    )

    SELECT a.FormulaId, b.FormulaId

    FROM formulae a

    INNER JOIN formulae b

    ON a.FormulaId < b.FormulaId

    AND a.IngredientList = b.IngredientList

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Drew,Thanks for your reply. I see that you have sorted the ingredients and this might work. let me try it out.

  • hamzajosh (12/22/2016)


    Drew,Thanks for your reply. I see that you have sorted the ingredients and this might work. let me try it out.

    I'd also be interested to hear why my solution is not suitable, as it returns exactly the data you were looking for.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil, Like Drew pointed out, your solution returns formulas that have even ONE ingredient in common, while I am looking for a exact match on ALL ingredients. I appreciate your effort and time. Drew, your solution worked. Awesome! thanks a ton.

  • hamzajosh (12/22/2016)


    Phil, Like Drew pointed out, your solution returns formulas that have even ONE ingredient in common, while I am looking for a exact match on ALL ingredients. I appreciate your effort and time. Drew, your solution worked. Awesome! thanks a ton.

    OK, that was not clear from your original post (at least, not to me). Thanks for responding.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Yes, My bad, sorry for not being clear.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply