Need help with a simple query from a one to many table relation.

Question

Post reply

Need help with a simple query from a one to many table relation.

knutpet

Grasshopper

Points: 14
More actions
October 7, 2015 at 4:20 am

#301330

I'm Norwegian and have 2 food and recipe website in Norway.
I been working with sql and asp for many years, but I'm stuck in a problem which for me looks like it could have a simple solution, but I just can't see it.
This is about a food recipe table and the relating table which shows different allergen in each recipe.
That means that one record in the recipe table can have from 0 to 14 records in the related allergen table.
Then I make a query for my readers so they can select recipes without the allergen they are allergic against.
This is easy if you ask for recipe with the chosen allergen, but the opposite way, I can't see the solution.
See a very simple example table with data and example query and the result I want. The original is much more complex.
First the recipe table: tblrecipe
Recipe_numberRecipe_name
1Pancakes
2Pasta
3Porridge
4Brownies
And then the related allergen table: tblRecipeAllergen
AllergenIdRecipe_numberAllergen
11Milk
21Egg
31Gluten
42Gluten
53Milk
63Egg
74Gluten
84Nuts
94Egg
Example a user want to see recipe without milk. And that is just Pasta and Brownies
My first try:
SELECT tblRecipe.Recipe_number, tblRecipe.Recipe_name, tblRecipeAllergen.Allergen
FROM tblRecipe INNER JOIN
tblRecipeAllergen ON tblRecipe.Recipe_number = tblRecipeAllergen.Recipe_number
WHERE (NOT (tblRecipeAllergen.Allergen LIKE 'milk'))
Produce this answer:
11Pancakes Egg
21Pancakes Gluten
32Pasta Gluten
43Porridge Egg
54Brownies Gluten
64Brownies Nuts
74Brownies Egg
But this only show records from tblrecipeAllergen without milk.
And I want this answer: Just pasta and brownies
2Pasta
4Brownies
I'm a little embarrassed that I did not see the answer, but my old brain just stop here.
I think I need some if-then-else or select case sentence in the code, but I'm not sure how to do it.
The query must check if there is a allergen in a recipe, and if it is then don't show that recipe in result.
If someone got a solution, please help. Or got some link to other with same problem.
Mvh
Knut P
Here is a script to produce the test tables with data: Just copy and paste into query analyzer
/****** Object: Table [dbo].[tblRecipe] Script Date: 07.10.2015 10:51:43 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tblRecipe](
[Recipe_number] [int] IDENTITY(1,1) NOT NULL,
[Recipe_name] [varchar](50) NULL,
CONSTRAINT [PK_tblRecipe] PRIMARY KEY CLUSTERED
(
[Recipe_number] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
/****** Object: Table [dbo].[tblRecipeAllergen] Script Date: 07.10.2015 10:54:03 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[tblRecipeAllergen](
[AllergenId] [int] IDENTITY(1,1) NOT NULL,
[Recipe_number] [int] NULL,
[Allergen] [varchar](50) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
SET IDENTITY_INSERT tblRecipe ON
INSERT INTO tblRecipe
(Recipe_number, Recipe_name)
SELECT '1','Pancakes' UNION
SELECT '2','Pasta' UNION
SELECT '3','Porridge' UNION
SELECT '4','Brownies'
SET IDENTITY_INSERT tblRecipe OFF
SET IDENTITY_INSERT tblRecipeAllergen ON
INSERT INTO tblRecipeAllergen
(AllergenId, Recipe_number, Allergen)
SELECT '1','1','Milk' UNION
SELECT '2','1','Egg' UNION
SELECT '3','1','Gluten' UNION
SELECT '4','2','Gluten' UNION
SELECT '5','3','Milk' UNION
SELECT '6','3','Egg' UNION
SELECT '7','4','Gluten' UNION
SELECT '8','4','Nuts' UNION
SELECT '9','4','Egg'
SET IDENTITY_INSERT tblRecipeAllergen OFF

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Knut

You need an outer join.

SELECT

r.Recipe_number

,r.Recipe_name

FROM tblRecipe AS r

LEFT JOIN tblRecipeAllergen AS a

ON r.Recipe_number = a.Recipe_number

AND a.Allergen = 'Milk'

WHERE a.Recipe_number IS NULL

John

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

WHERE NOT EXISTS may be a little more intuitive. Borrowing John's code:

SELECT

r.Recipe_number,

r.Recipe_name

FROM tblRecipe AS r

WHERE NOT EXISTS (

SELECT 1

FROM tblRecipeAllergen AS a

WHERE a.Recipe_number = r.Recipe_number

AND a.Allergen = 'Milk'

)

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

knutpet Grasshopper Points: 14 More actions · Answer 3

Hi John and ChrisM

Booth works perfectly.

Now I got the basic here and then I can try it on the original table with 35.000 records and see which one is the fastest.

Many, many thanks.

My brain must have completely disconnect today

Best Regards from Norway!

ScottPletcher SSC Guru Points: 101099 More actions · Answer 4

For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.

SELECT ra.Recipe_number, r.Recipe_name

FROM (

SELECT Recipe_number

FROM tblRecipeAllergen

GROUP BY Recipe_number

HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0

) AS ra

INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Michael L John One Orange Chip Points: 27056 More actions · Answer 5

ScottPletcher (10/7/2015)
For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.
SELECT ra.Recipe_number, r.Recipe_name
FROM (
SELECT Recipe_number
FROM tblRecipeAllergen
GROUP BY Recipe_number
HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0
) AS ra
INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

Efficiency? Why would a subquery with a aggregate function be more efficient?

Extending any of these to allow for additional ingredients is also pretty easy.

As a simple test, I created these tables with a millions rows and ran the same queries 10 times.

Your version averaged 3328 milliseconds, the not exists version averaged 2117 milliseconds.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

ScottPletcher SSC Guru Points: 101099 More actions · Answer 6

Michael L John (10/7/2015)
ScottPletcher (10/7/2015)
For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.
SELECT ra.Recipe_number, r.Recipe_name
FROM (
SELECT Recipe_number
FROM tblRecipeAllergen
GROUP BY Recipe_number
HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0
) AS ra
INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number
Efficiency? Why would a subquery with a aggregate function be more efficient?
Extending any of these to allow for additional ingredients is also pretty easy.
As a simple test, I created these tables with a millions rows and ran the same queries 10 times.
Your version averaged 3328 milliseconds, the not exists version averaged 2117 milliseconds.

The subquery is so that the Recipe_numbers only have to looked up once, rather than for every row.

I'd love to see the actual test data you used. Seems odd to me.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Michael L John One Orange Chip Points: 27056 More actions · Answer 7

ScottPletcher (10/7/2015)
Michael L John (10/7/2015)
ScottPletcher (10/7/2015)
For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.
SELECT ra.Recipe_number, r.Recipe_name
FROM (
SELECT Recipe_number
FROM tblRecipeAllergen
GROUP BY Recipe_number
HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0
) AS ra
INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number
Efficiency? Why would a subquery with a aggregate function be more efficient?
Extending any of these to allow for additional ingredients is also pretty easy.
As a simple test, I created these tables with a millions rows and ran the same queries 10 times.
Your version averaged 3328 milliseconds, the not exists version averaged 2117 milliseconds.
The subquery is so that the Recipe_numbers only have to looked up once, rather than for every row.
I'd love to see the actual test data you used. Seems odd to me.

I took the same inserts provided and re-inserted them repeatedly until there was a million rows.

I ran each query 10 times, capturing the number of milliseconds each time.

That's it.

Out of curiosity, I set statisics IO on and ran them.

Here are results:

--Version with aggregate

Table 'tblRecipe'. Scan count 0, logical reads 1714281, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

--Version with not exists

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipe'. Scan count 1, logical reads 3718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Pretty clear that the logical reads, as I suspected, are significantly higher with the aggregate.

I just now modified the queries. I added another condition to the case in the aggregate, and changed the where in the not exists to IN ('milk', 'nuts')

The difference was not as significant, averaging 3124 to 2713 of the 10 runs.

The IO difference was nearly identical to the first run.

--With aggregate, additional condition added

Table 'tblRecipe'. Scan count 0, logical reads 1428567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

--With not exists, IN added to WHERE

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipe'. Scan count 1, logical reads 3718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

I then created a covering index on tblRecipieAllergen.

CREATE NONCLUSTERED INDEX [IX2_tblRecipeAllergen]

ON [dbo].[tblRecipeAllergen] ([Allergen])

INCLUDE ([Recipe_number])

GO

The times differential increased significantly for the not exists, almost the same as the original test. IO was the same.

The execution plans are attached.

Obviously, I threw this together while at work pretty quickly. But I can't see how adding the complexity of an aggregate and a case statement in a subquery can be more efficient than a where clause.

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

ScottPletcher SSC Guru Points: 101099 More actions · Answer 8

Michael L John (10/7/2015)
ScottPletcher (10/7/2015)
Michael L John (10/7/2015)
ScottPletcher (10/7/2015)
For efficiency, try this. It is also easy to extend this to include/exclude multiple ingredients.
SELECT ra.Recipe_number, r.Recipe_name
FROM (
SELECT Recipe_number
FROM tblRecipeAllergen
GROUP BY Recipe_number
HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0
) AS ra
INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number
Efficiency? Why would a subquery with a aggregate function be more efficient?
Extending any of these to allow for additional ingredients is also pretty easy.
As a simple test, I created these tables with a millions rows and ran the same queries 10 times.
Your version averaged 3328 milliseconds, the not exists version averaged 2117 milliseconds.
The subquery is so that the Recipe_numbers only have to looked up once, rather than for every row.
I'd love to see the actual test data you used. Seems odd to me.
I took the same inserts provided and re-inserted them repeatedly until there was a million rows.
I ran each query 10 times, capturing the number of milliseconds each time.
That's it.
Out of curiosity, I set statisics IO on and ran them.
Here are results:
--Version with aggregate
Table 'tblRecipe'. Scan count 0, logical reads 1714281, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Version with not exists
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblRecipe'. Scan count 1, logical reads 3718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Pretty clear that the logical reads, as I suspected, are significantly higher with the aggregate.
I just now modified the queries. I added another condition to the case in the aggregate, and changed the where in the not exists to IN ('milk', 'nuts')
The difference was not as significant, averaging 3124 to 2713 of the 10 runs.
The IO difference was nearly identical to the first run.
--With aggregate, additional condition added
Table 'tblRecipe'. Scan count 0, logical reads 1428567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--With not exists, IN added to WHERE
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblRecipe'. Scan count 1, logical reads 3718, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblRecipeAllergen'. Scan count 1, logical reads 3247, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
I then created a covering index on tblRecipieAllergen.
CREATE NONCLUSTERED INDEX [IX2_tblRecipeAllergen]
ON [dbo].[tblRecipeAllergen] ([Allergen])
INCLUDE ([Recipe_number])
GO
The times differential increased significantly for the not exists, almost the same as the original test. IO was the same.
The execution plans are attached.
Obviously, I threw this together while at work pretty quickly. But I can't see how adding the complexity of an aggregate and a case statement in a subquery can be more efficient than a where clause.

Interesting. Are you sure you don't have those stats backwards? The NOT EXISTS must check the entire table every time to verify that a given row does not exist. The GROUP BY will simply scan, and, if the table is not indexed correctly, sort the data and do an aggregate.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Michael L John One Orange Chip Points: 27056 More actions · Answer 9

Nope, I actually double checked

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

ScottPletcher SSC Guru Points: 101099 More actions · Answer 10

Michael L John (10/7/2015)
Nope, I actually double checked

Interesting. That's just really not logical to me. The NOT EXISTS() will have to do a full scan of the table for every Recipe_Number that does not have a 'Milk' row (since no index is available). The GROUP BY just has to do a single full scan of the table, period. When I ran the code below I got these I/O numbers:

--GROUP BY

(2 row(s) affected)

Table 'tblRecipe'. Scan count 0, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipeAllergen'. Scan count 1, logical reads 2942, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

--NOT EXISTS

(2 row(s) affected)

Table 'tblRecipeAllergen'. Scan count 1, logical reads 5886, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipe'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

--Add a new Recipe without Milk

(3 row(s) affected)

Table 'tblRecipe'. Scan count 0, logical reads 6, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipeAllergen'. Scan count 1, logical reads 2942, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(3 row(s) affected)

Table 'tblRecipeAllergen'. Scan count 1, logical reads 8828, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'tblRecipe'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SET IDENTITY_INSERT tblRecipeAllergen ON

INSERT INTO tblRecipeAllergen

(AllergenId, Recipe_number, Allergen)

SELECT '1','1','Milk' UNION

SELECT '2','1','Egg' UNION

SELECT '3','1','Gluten' UNION

SELECT '4','2','Gluten' UNION

SELECT '5','3','Milk' UNION

SELECT '6','3','Egg' UNION

SELECT '7','4','Gluten' UNION

SELECT '8','4','Nuts' UNION

SELECT '9','4','Egg'

SET IDENTITY_INSERT tblRecipeAllergen OFF

GO 100000 --Warning! This may take a while!

SET STATISTICS IO ON

SELECT ra.Recipe_number, r.Recipe_name

FROM (

SELECT Recipe_number

FROM tblRecipeAllergen

GROUP BY Recipe_number

HAVING MAX(CASE WHEN Allergen = 'Milk' THEN 1 ELSE 0 END) = 0

) AS ra

INNER JOIN tblRecipe r ON r.Recipe_number = ra.Recipe_number

SELECT

r.Recipe_number,

r.Recipe_name

FROM tblRecipe AS r

WHERE NOT EXISTS (

SELECT 1

FROM tblRecipeAllergen AS a

WHERE a.Recipe_number = r.Recipe_number

AND a.Allergen = 'Milk'

)

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

drew.allen SSC Guru Points: 76985 More actions · Answer 11

ScottPletcher (10/7/2015)
Interesting. Are you sure you don't have those stats backwards? The NOT EXISTS must check the entire table every time to verify that a given row does not exist. The GROUP BY will simply scan, and, if the table is not indexed correctly, sort the data and do an aggregate.

Actually, no. The NOT EXISTS does not have to check the entire table every time. It can quit as soon as it finds any record that matches the criteria. If there is an appropriate index, this check can be very quick even if no such record exists.

The aggregate is not SARGable, so it can never use an index.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

ScottPletcher SSC Guru Points: 101099 More actions · Answer 12

drew.allen (10/7/2015)
ScottPletcher (10/7/2015)
Interesting. Are you sure you don't have those stats backwards? The NOT EXISTS must check the entire table every time to verify that a given row does not exist. The GROUP BY will simply scan, and, if the table is not indexed correctly, sort the data and do an aggregate.
Actually, no. The NOT EXISTS does not have to check the entire table every time. It can quit as soon as it finds any record that matches the criteria. If there is an appropriate index, this check can be very quick even if no such record exists.
The aggregate is not SARGable, so it can never use an index.
Drew

Yes, I misstated that, although the sample result below it was correct and had the correct statement. SQL must check the entire table every time the desired row does not exist. It must read part of the table to verify that the row exists and thus prove that the condition is not true.

And, yes, with the appropriate index, the NOT EXISTS becomes better. But not as the tables were presented.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Greg Edwards-268690 SSC-Insane Points: 20680 More actions · Answer 13

NOT EXISTS, from what I understand, is a simple Boolean check.

Aggregates must first gather all results.

I am not at all surprised it is generally faster.

Many times I found it to take a user query that churned, sometimes never finishing, into one that returned results very quickly.

A very good item to have in your toolbox.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 14

Greg Edwards-268690 (10/8/2015)
NOT EXISTS, from what I understand, is a simple Boolean check.
Aggregates must first gather all results.
I am not at all surprised it is generally faster.
Many times I found it to take a user query that churned, sometimes never finishing, into one that returned results very quickly.
A very good item to have in your toolbox.

Absolutely - but I've also seen Scott's suggestion of a derived table with aggregation to leave an EXISTS check in the dust. It depends on the shape of the data. EXISTS almost always wins when a) the searched column is a long way from unique or b) you have a small number of probes into a much larger probed table.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden