Need help with a simple query from a one to many table relation.

  • I'm Norwegian and have 2 food and recipe website in Norway.

    I been working with sql and asp for many years, but I'm stuck in a problem which for me looks like it could have a simple solution, but I just can't see it.

    This is about a food recipe table and the relating table which shows different allergen in each recipe.

    That means that one record in the recipe table can have from 0 to 14 records in the related allergen table.

    Then I make a query for my readers so they can select recipes without the allergen they are allergic against.

    This is easy if you ask for recipe with the chosen allergen, but the opposite way, I can't see the solution.

    See a very simple example table with data and example query and the result I want. The original is much more complex.

    First the recipe table: tblrecipe

    Recipe_numberRecipe_name

    1Pancakes

    2Pasta

    3Porridge

    4Brownies

    And then the related allergen table: tblRecipeAllergen

    AllergenIdRecipe_numberAllergen

    11Milk

    21Egg

    31Gluten

    42Gluten

    53Milk

    63Egg

    74Gluten

    84Nuts

    94Egg

    Example a user want to see recipe without milk. And that is just Pasta and Brownies

    My first try:

    SELECT tblRecipe.Recipe_number, tblRecipe.Recipe_name, tblRecipeAllergen.Allergen

    FROM tblRecipe INNER JOIN

    tblRecipeAllergen ON tblRecipe.Recipe_number = tblRecipeAllergen.Recipe_number

    WHERE (NOT (tblRecipeAllergen.Allergen LIKE 'milk'))

    Produce this answer:

    11Pancakes Egg

    21Pancakes Gluten

    32Pasta Gluten

    43Porridge Egg

    54Brownies Gluten

    64Brownies Nuts

    74Brownies Egg

    But this only show records from tblrecipeAllergen without milk.

    And I want this answer: Just pasta and brownies

    2Pasta

    4Brownies

    I'm a little embarrassed that I did not see the answer, but my old brain just stop here.

    I think I need some if-then-else or select case sentence in the code, but I'm not sure how to do it.

    The query must check if there is a allergen in a recipe, and if it is then don't show that recipe in result.

    If someone got a solution, please help. Or got some link to other with same problem.

    Mvh

    Knut P

    Here is a script to produce the test tables with data: Just copy and paste into query analyzer

    /****** Object: Table [dbo].[tblRecipe] Script Date: 07.10.2015 10:51:43 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[tblRecipe](

    [Recipe_number] [int] IDENTITY(1,1) NOT NULL,

    [Recipe_name] [varchar](50) NULL,

    CONSTRAINT [PK_tblRecipe] PRIMARY KEY CLUSTERED

    (

    [Recipe_number] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    /****** Object: Table [dbo].[tblRecipeAllergen] Script Date: 07.10.2015 10:54:03 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[tblRecipeAllergen](

    [AllergenId] [int] IDENTITY(1,1) NOT NULL,

    [Recipe_number] [int] NULL,

    [Allergen] [varchar](50) NULL

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    SET IDENTITY_INSERT tblRecipe ON

    INSERT INTO tblRecipe

    (Recipe_number, Recipe_name)

    SELECT '1','Pancakes' UNION

    SELECT '2','Pasta' UNION

    SELECT '3','Porridge' UNION

    SELECT '4','Brownies'

    SET IDENTITY_INSERT tblRecipe OFF

    SET IDENTITY_INSERT tblRecipeAllergen ON

    INSERT INTO tblRecipeAllergen

    (AllergenId, Recipe_number, Allergen)

    SELECT '1','1','Milk' UNION

    SELECT '2','1','Egg' UNION

    SELECT '3','1','Gluten' UNION

    SELECT '4','2','Gluten' UNION

    SELECT '5','3','Milk' UNION

    SELECT '6','3','Egg' UNION

    SELECT '7','4','Gluten' UNION

    SELECT '8','4','Nuts' UNION

    SELECT '9','4','Egg'

    SET IDENTITY_INSERT tblRecipeAllergen OFF

  • Knut

    You need an outer join.

    SELECT

    r.Recipe_number

    ,r.Recipe_name

    FROM tblRecipe AS r

    LEFT JOIN tblRecipeAllergen AS a

    ON r.Recipe_number = a.Recipe_number

    AND a.Allergen = 'Milk'

    WHERE a.Recipe_number IS NULL

    John

  • WHERE NOT EXISTS may be a little more intuitive. Borrowing John's code:

    SELECT

    r.Recipe_number,

    r.Recipe_name

    FROM tblRecipe AS r

    WHERE NOT EXISTS (

    SELECT 1

    FROM tblRecipeAllergen AS a

    WHERE a.Recipe_number = r.Recipe_number

    AND a.Allergen = 'Milk'

    )

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi John and ChrisM

    Booth works perfectly.

    Now I got the basic here and then I can try it on the original table with 35.000 records and see which one is the fastest.

    Many, many thanks.

    My brain must have completely disconnect today

    Best Regards from Norway!

  • For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.

    SELECT ra.Recipe_number, r.Recipe_name

    FROM (

    SELECT Recipe_number

    FROM tblRecipeAllergen

    GROUP BY Recipe_number

    HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0

    ) AS ra

    INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (10/7/2015)


    For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.

    SELECT ra.Recipe_number, r.Recipe_name

    FROM (

    SELECT Recipe_number

    FROM tblRecipeAllergen

    GROUP BY Recipe_number

    HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0

    ) AS ra

    INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

    Efficiency? Why would a subquery with a aggregate function be more efficient?

    Extending any of these to allow for additional ingredients is also pretty easy.

    As a simple test, I created these tables with a millions rows and ran the same queries 10 times.

    Your version averaged 3328 milliseconds, the not exists version averaged 2117 milliseconds.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John (10/7/2015)


    ScottPletcher (10/7/2015)


    For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.

    SELECT ra.Recipe_number, r.Recipe_name

    FROM (

    SELECT Recipe_number

    FROM tblRecipeAllergen

    GROUP BY Recipe_number

    HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0

    ) AS ra

    INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

    Efficiency? Why would a subquery with a aggregate function be more efficient?

    Extending any of these to allow for additional ingredients is also pretty easy.

    As a simple test, I created these tables with a millions rows and ran the same queries 10 times.

    Your version averaged 3328 milliseconds, the not exists version averaged 2117 milliseconds.

    The subquery is so that the Recipe_numbers only have to looked up once, rather than for every row.

    I'd love to see the actual test data you used. Seems odd to me.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (10/7/2015)


    Michael L John (10/7/2015)


    ScottPletcher (10/7/2015)


    For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.

    SELECT ra.Recipe_number, r.Recipe_name

    FROM (

    SELECT Recipe_number

    FROM tblRecipeAllergen

    GROUP BY Recipe_number

    HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0

    ) AS ra

    INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

    Efficiency? Why would a subquery with a aggregate function be more efficient?

    Extending any of these to allow for additional ingredients is also pretty easy.

    As a simple test, I created these tables with a millions rows and ran the same queries 10 times.

    Your version averaged 3328 milliseconds, the not exists version averaged 2117 milliseconds.

    The subquery is so that the Recipe_numbers only have to looked up once, rather than for every row.

    I'd love to see the actual test data you used. Seems odd to me.

    I took the same inserts provided and re-inserted them repeatedly until there was a million rows.

    I ran each query 10 times, capturing the number of milliseconds each time.

    That's it.

    Out of curiosity, I set statisics IO on and ran them.

    Here are results:

    --Version with aggregate

    Table 'tblRecipe'. Scan count 0, logical reads 1714281, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --Version with not exists

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipe'. Scan count 1, logical reads 3718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Pretty clear that the logical reads, as I suspected, are significantly higher with the aggregate.

    I just now modified the queries. I added another condition to the case in the aggregate, and changed the where in the not exists to IN ('milk', 'nuts')

    The difference was not as significant, averaging 3124 to 2713 of the 10 runs.

    The IO difference was nearly identical to the first run.

    --With aggregate, additional condition added

    Table 'tblRecipe'. Scan count 0, logical reads 1428567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --With not exists, IN added to WHERE

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipe'. Scan count 1, logical reads 3718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    I then created a covering index on tblRecipieAllergen.

    CREATE NONCLUSTERED INDEX [IX2_tblRecipeAllergen]

    ON [dbo].[tblRecipeAllergen] ([Allergen])

    INCLUDE ([Recipe_number])

    GO

    The times differential increased significantly for the not exists, almost the same as the original test. IO was the same.

    The execution plans are attached.

    Obviously, I threw this together while at work pretty quickly. But I can't see how adding the complexity of an aggregate and a case statement in a subquery can be more efficient than a where clause.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John (10/7/2015)


    ScottPletcher (10/7/2015)


    Michael L John (10/7/2015)


    ScottPletcher (10/7/2015)


    For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.

    SELECT ra.Recipe_number, r.Recipe_name

    FROM (

    SELECT Recipe_number

    FROM tblRecipeAllergen

    GROUP BY Recipe_number

    HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0

    ) AS ra

    INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

    Efficiency? Why would a subquery with a aggregate function be more efficient?

    Extending any of these to allow for additional ingredients is also pretty easy.

    As a simple test, I created these tables with a millions rows and ran the same queries 10 times.

    Your version averaged 3328 milliseconds, the not exists version averaged 2117 milliseconds.

    The subquery is so that the Recipe_numbers only have to looked up once, rather than for every row.

    I'd love to see the actual test data you used. Seems odd to me.

    I took the same inserts provided and re-inserted them repeatedly until there was a million rows.

    I ran each query 10 times, capturing the number of milliseconds each time.

    That's it.

    Out of curiosity, I set statisics IO on and ran them.

    Here are results:

    --Version with aggregate

    Table 'tblRecipe'. Scan count 0, logical reads 1714281, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --Version with not exists

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipe'. Scan count 1, logical reads 3718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Pretty clear that the logical reads, as I suspected, are significantly higher with the aggregate.

    I just now modified the queries. I added another condition to the case in the aggregate, and changed the where in the not exists to IN ('milk', 'nuts')

    The difference was not as significant, averaging 3124 to 2713 of the 10 runs.

    The IO difference was nearly identical to the first run.

    --With aggregate, additional condition added

    Table 'tblRecipe'. Scan count 0, logical reads 1428567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --With not exists, IN added to WHERE

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipe'. Scan count 1, logical reads 3718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    I then created a covering index on tblRecipieAllergen.

    CREATE NONCLUSTERED INDEX [IX2_tblRecipeAllergen]

    ON [dbo].[tblRecipeAllergen] ([Allergen])

    INCLUDE ([Recipe_number])

    GO

    The times differential increased significantly for the not exists, almost the same as the original test. IO was the same.

    The execution plans are attached.

    Obviously, I threw this together while at work pretty quickly. But I can't see how adding the complexity of an aggregate and a case statement in a subquery can be more efficient than a where clause.

    Interesting. Are you sure you don't have those stats backwards? The NOT EXISTS must check the entire table every time to verify that a given row does not exist. The GROUP BY will simply scan, and, if the table is not indexed correctly, sort the data and do an aggregate.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Nope, I actually double checked

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • Michael L John (10/7/2015)


    Nope, I actually double checked

    Interesting. That's just really not logical to me. The NOT EXISTS() will have to do a full scan of the table for every Recipe_Number that does not have a 'Milk' row (since no index is available). The GROUP BY just has to do a single full scan of the table, period. When I ran the code below I got these I/O numbers:

    --GROUP BY

    (2 row(s) affected)

    Table 'tblRecipe'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 2942, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --NOT EXISTS

    (2 row(s) affected)

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 5886, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipe'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    --Add a new Recipe without Milk

    (3 row(s) affected)

    Table 'tblRecipe'. Scan count 0, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 2942, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    (3 row(s) affected)

    Table 'tblRecipeAllergen'. Scan count 1, logical reads 8828, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'tblRecipe'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SET IDENTITY_INSERT tblRecipeAllergen ON

    INSERT INTO tblRecipeAllergen

    (AllergenId, Recipe_number, Allergen)

    SELECT '1','1','Milk' UNION

    SELECT '2','1','Egg' UNION

    SELECT '3','1','Gluten' UNION

    SELECT '4','2','Gluten' UNION

    SELECT '5','3','Milk' UNION

    SELECT '6','3','Egg' UNION

    SELECT '7','4','Gluten' UNION

    SELECT '8','4','Nuts' UNION

    SELECT '9','4','Egg'

    SET IDENTITY_INSERT tblRecipeAllergen OFF

    GO 100000 --Warning! This may take a while!

    SET STATISTICS IO ON

    SELECT ra.Recipe_number, r.Recipe_name

    FROM (

    SELECT Recipe_number

    FROM tblRecipeAllergen

    GROUP BY Recipe_number

    HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0

    ) AS ra

    INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

    SELECT

    r.Recipe_number,

    r.Recipe_name

    FROM tblRecipe AS r

    WHERE NOT EXISTS (

    SELECT 1

    FROM tblRecipeAllergen AS a

    WHERE a.Recipe_number = r.Recipe_number

    AND a.Allergen = 'Milk'

    )

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (10/7/2015)


    Interesting. Are you sure you don't have those stats backwards? The NOT EXISTS must check the entire table every time to verify that a given row does not exist. The GROUP BY will simply scan, and, if the table is not indexed correctly, sort the data and do an aggregate.

    Actually, no. The NOT EXISTS does not have to check the entire table every time. It can quit as soon as it finds any record that matches the criteria. If there is an appropriate index, this check can be very quick even if no such record exists.

    The aggregate is not SARGable, so it can never use an index.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen (10/7/2015)


    ScottPletcher (10/7/2015)


    Interesting. Are you sure you don't have those stats backwards? The NOT EXISTS must check the entire table every time to verify that a given row does not exist. The GROUP BY will simply scan, and, if the table is not indexed correctly, sort the data and do an aggregate.

    Actually, no. The NOT EXISTS does not have to check the entire table every time. It can quit as soon as it finds any record that matches the criteria. If there is an appropriate index, this check can be very quick even if no such record exists.

    The aggregate is not SARGable, so it can never use an index.

    Drew

    Yes, I misstated that, although the sample result below it was correct and had the correct statement. SQL must check the entire table every time the desired row does not exist. It must read part of the table to verify that the row exists and thus prove that the condition is not true.

    And, yes, with the appropriate index, the NOT EXISTS becomes better. But not as the tables were presented.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • NOT EXISTS, from what I understand, is a simple Boolean check.

    Aggregates must first gather all results.

    I am not at all surprised it is generally faster.

    Many times I found it to take a user query that churned, sometimes never finishing, into one that returned results very quickly.

    A very good item to have in your toolbox.

  • Greg Edwards-268690 (10/8/2015)


    NOT EXISTS, from what I understand, is a simple Boolean check.

    Aggregates must first gather all results.

    I am not at all surprised it is generally faster.

    Many times I found it to take a user query that churned, sometimes never finishing, into one that returned results very quickly.

    A very good item to have in your toolbox.

    Absolutely - but I've also seen Scott's suggestion of a derived table with aggregation to leave an EXISTS check in the dust. It depends on the shape of the data. EXISTS almost always wins when a) the searched column is a long way from unique or b) you have a small number of probes into a much larger probed table.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply